DOCUMENT RESUME 



ED 455 642 



EC 308 530 



AUTHOR 

TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 
JOURNAL CIT 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Miller, Phyllis, Ed. 

What Does Being a G Wiz Mean in Real Life? 

American Mensa Education and Research Foundation, Arlington, 
TX. 

2001 - 00-00 

97p . ; Theme issue. Published three times a year. 

Mensa Education and Research Foundation, 1229 Corporate Dr. 
West, Arlington, TX 76006-6103. Tel: 973-655-4225; Fax: 
973-655-7382; e-mail: millerp@mail.montclair.edu. 

Collected Works - Serials (022) 

Mensa Research Journal; v32 n2 2001 
MF01/PC04 Plus Postage. 

Admission (School); Adults; *Aging (Individuals); *Gifted; 
♦Graduate Study; *Higher Education; *Intelligence Tests; 
Knowledge Level; *Life Satisfaction; Longitudinal Studies; 
Older Adults; *Well Being 
Graduate Record Examinations 



ABSTRACT 



The articles in this journal issue examine three different 
aspects of gifted adulthood. In "Self -Appraisal , Life Satisfaction, and 
Retrospective Life Choices across One and Three Decades" (Carole K. Holahan 
and others), 383 children who were part of Terman ' s original study of the 
gifted in 1921 were revisited at various points in their lives. Participants 
who reported living up to their intellectual abilities were higher in overall 
life satisfaction and were less likely tc report that they would make 
different choices in work or family life three decades later. "Consequences 
of How We Define and Assess Intelligence" (Wendy J. Williams) , considers some 
of the consequences of how intelligence is defined and assessed in young 
adults. In particular, it discusses the Graduate Record Examination and its 
usefulness in predicting success in graduate school. The use of intelligence 
tests in general is based on a certain definition of intelligence, and the 
article argues that such a definition is not necessarily what is needed to 
determine success in school. The last article, "The Locus of Adult 
Intelligence: Knowledge, Abilities, and Nonability Traits" (Phillip L. 
Ackerman and Eric L. Rolfhus) , draws the distinction between general 
intelligence and knowledge, and studies the relationship of both to the aging 
process. (Articles include references.) (CR) 




r 




m 

in 




What Does Being a G Wiz 
Mean in Real Life? 



Phyllis Miller, Editor 



Mensa Research Journal 
Volume 32, Number 2 



US. DEPARTMENT OF EDUCATION 
Office of Educational Research and 

Improvement EDUCATIONAL RESOURCES 
INFORMATION 
CENTER (ERIC) 

□ This document has been reproduced 
as received from the person or 
organization originating it. 

□ Minor changes have been made to 
improve reproduction quality 



Points of view or opinions stated in 
this document do not necessarily 
represent official OERI position or 
policy. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL HAS 
BEEN GRANTED BY 



Miller 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 

1 




CO 

in 



co 



co 

O 

111 




BEST COPY AVAILABLE 







- ' / 

Published by Mensa Education & Research 
1 \ ' ■ / 
Foundation and Mensa International, Ltd.™ 

Mensa 

Research Journal \ 



What does being a g/wiz mean in real life? 



Vol. 32, 



Mensa (lig) 

Research Journal 

Vol. 32, No. 2 

Table of Contents 

The Mensa Education & Research Foundation 3 

Editor’s Preface 5 

Self-Appraisal, Life Satisfaction, and Retrospective Life 
Choices Across One and Three Decades 

by Carole K. Holahan, Charles J. Holahan, and Nancy L. Wonacott . .6 

Consequences of How We Define and Assess Intelligence 

by Wendy J. Williams . 21 

The Locus of Adult Intelligence: Knowledge, Abilities, and 
Nonability Traits 

by Phillip L. Ackerman and Eric L. Rolfhus 54 



© 2001 Mensa Education & Research Foundation 

4 



1 




Mensa Research Journal is published three times a year, in conjunction 
with Mensa International, Ltd., by the Mensa Education & Research 
Foundation, Dr. Michael Jacobson, president, 1840 N. Oak Park Ave., 
Chicago, IL 60635-3314. 

Staff: 

Editor • Phyllis Miller, 23 Lexington Road, Somerset, NJ 08873, 
MRJ@merf.us.mensa.org 

Associate Editor • Francis Cartier, 1029 Forest Ave., Pacific Grove, CA 
93950, fcar889755@cs.com 

Circulation Manager • Patty Wood, American Mensa, Ltd., 1229 
Corporate Drive West, Arlington, TX 76006-61 03, 
PattyW@AmericanMensa.org. 

Production Manager • Annette L. Kovac, American Mensa, Ltd. ,1229 
Corporate Drive West, Arlington, TX 76006-6103, 
AnnetteK@AmericanMensa.org. 



Mensa Research Journal 
Editorial Advisory Board 

Francis Cartier, Ph.D. 

Ilene Hartman-Abramson, Ph.D. 
Michael H. Jacobson, Ph.D. 
Ken Martin, Ph.D. 
Caroline Mossip, Psy.D. 
Charles A. Rawlings, Ph.D. 
Abbie F. Salny, Ed.D. 
Hirsch Lazaar Silverman, Ph.D. 



Each article reprinted in MRJ is reprinted with permission of the copy- 
right holder. 

The Journal is published three times a year. Subscribers will receive all issues 
for which they have paid, even if frequency of publication varies. Membership in 
Mensa is not required. To subscribe, order back issues, or report address 
changes, write to Mensa Research Journal, 1229 Corporate Drive West, 
Arlington, TX 76006-6103. 



2 



5 




The Mensa Education & Research Foundation 



Mensa, the high IQ society, provides a meeting of the minds for people 
who score in the top 2 percent on standardized IQ tests. As an interna- 
tional organization with thousands of members worldwide, Mensa seeks 
to identify and foster human intelligence; encourage research in the 
nature, characteristics and uses of intelligence; and provide a stimulating 
intellectual and social environment for its members. 

The first two of these purposes are largely carried out by the Mensa 
Education & Research Foundation (MERF). MERF is a philanthropic, 
nonprofit, tax-exempt organization funded primarily by gifts from Mensa 
members and others. MERF awards scholarships and research prizes, 
publishes the Mensa Research Journal, and funds other projects consis- 
tent with its mission. 

For more information about the Mensa Education & Research 
Foundation, write to MERF, 1229 Corporate Drive West, Arlington, 

TX 76006-6103. For information about joining American Mensa, call 
1 -800-66-MENS A. See also www.us.mensa.org. If you reside outside the 
U.S., see the Mensa International, Ltd., Web site at www.mensa.org. 



To renew your subscription or subscribe to the Mensa Research 
Journal , just fill out the form on the next page (or a photocopy of it) 
and mail it with a check in the appropriate amount. Please refer to 
your mailing label now to determine how many issues of the MRJ you 
will receive before your subscription expires. 



6 



3 




To: MERF, 1229 Corporate Drive West, Arlington, Texas 76006-6103 



Please enter my subscription to the Mensa Research Journal. 

□ New □ 3 issues U.S. $21 (outside U.S. $25) 

□ Renewal □ 6 issues U.S. $42 (outside U.S. $50) 

□ Sample Mensa Research Journal U.S. $7 (outside U.S. $9) 

Subscription amount enclosed $ . 

Please make payments in U.S. funds. 

I would like to make the following tax-deductible contribution to MERF. 



□ Contributor ($25 to $99) $ 

□ MERF Donor ($100-249) $ I 

□ Bronze Donor ($250-499) $ I 

□ Silver Donor ($500-999) $ I 

□ Gold Donor ($1 ,000-2,499) $ ( 

□ Platinum Donor ($2,500+) $ 

I would like my contribution to be a □ Memorial honoring 

□ Tribute honoring _ 

Please send acknowledgement or notice to 

Address 

City/State/Zip - 

Name and address of contributor/subscriber: 



Allocation (if desired) 

□ General support 

□ Scholarships 

□ Gifted Children Programs 

□ Addition to endowed fund 

named — : 



Name 



Address , 



City/State/Zip 



Member # . 







Editor’s Preface 



A great deal of attention is given to gifted children these days, and rightly so. 
As the twig is bent, etc., etc., and those of us who have gifted children or were 
gifted kids ourselves know that gifts can be stunted if they are not properly nur- 
tured. However, much less attention is given to what happens when we grow up. 
The three articles in this issue of the Mensa Research Journal examine three 
different aspects of gifted adulthood. 

Do you think you have lived up to your intellectual abilities? Do you think 
maybe you should have chosen a more intellectually demanding profession? At 
the age of 80, when you look back on your life, will you be satisfied? These are 
the questions Carole Holahan, Charles Holahan, and Nancy Wonacott sought to 
answer when they revisited the children who were part of Terman ’s original 
study of the gifted. Lewis Terman began a study of a group of gifted children in 
1921, and the Terman Study of the Gifted is now the oldest and most complete 
study of the human life cycle. These gifted people have been interviewed at var- 
ious points in their lives, and their thoughts about how they have lived and the 
choices they made is fascinating reading. 

Wendy Williams considers some of the consequences of how intelligence is 
defined and assessed in young adults. In particular, she discusses the Graduate 
Record Examination (GRE), which is used extensively in this country in the 
graduate school admissions process, and its usefulness in predicting success in 
graduate school. The use of intelligence tests in general is based on a certain 
definition of intelligence, and Williams argues that such a definition is not nec- 
essarily what is needed to determine success in school. 

In the final piece, Phillip Ackerman and' Eric Rolfhus draw the distinction 
between general intelligence and knowledge, and study the relationship of both 
to the aging process. Why is it that when you play Trivial Pursuit™ with your 
kids, you always win? They may be intelligent, more intelligent than you, but 
you have greater knowledge. I’m always amazed at my children’s lack of 
knowledge — they are in their 30s, and there is so much they don’t know. It 
occurs to me that when I was in my 30s, my parents felt the same way about 
me! 



Phyllis Miller 
Editor 




Self-Appraisal, Life Satisfaction, and Retrospective 
Life Choices Across One and Three Decades 



Carole K. Holahan , Charles J. Holahan, and Nancy L. Wonacott, 
University of Texas at Austin 

This research investigated the relationship of a self-appraisal of having lived 
up to one s intellectual abilities at midlife (average age of 49 years) with life 
satisfaction and retrospective life choices one and three decades later among 
383 participants in the Terman Study of the Gifted. Study 1 showed that partici- 
pants who reported living up to their intellectual abilities were higher in satis- 
faction with occupational success, satisfaction with family life, and joy in living 
11 years later. Study 2 showed that participants who reported living up to their 
abilities were higher in overall life satisfaction and were less likely to report 
that they would make different life choices in work or family life three decades 
later. In an integrative structural equation model, the relation between the 
midlife self-appraisal of having lived up to intellectual abilities and overall sat- 
isfaction at age 80 was mediated by life satisfaction discrepancy at age 61. 

Self-concept theorists increasingly view the self as comprised of a variety of 
representations (Markus & Wurf. 1987). Because the self-concept conveys rep- 
resentations of one’s actual and ideal self (Higgins, 1987), it can shape affective 
and motivational outcomes in a powerful and enduring way (Ross & Conway, 
1986). The present research examined a component of the self-concept likely to 
play a key role in adult development — a midlife appraisal of having lived up to 
one’s intellectual abilities — among members of the Terman Study of the Gifted 
(Terman, 1925). The research examined several aspects of life satisfaction 
approximately one and three decades later and explored the relation of later life 
satisfaction with alternative life choices. 

One aspect of the self-concept, that which reflects the “self that might have 
been,” is a topic of emerging interest (Landman & Martis, 1992; Landman, 
Vandewater, Stewart, & Malley, 1995). Markus and Wurf (1987) theorized that 
the self one would like to be provides a conceptual anchor for evaluating one’s 
current self. Similarly, Higgins and colleagues (Higgins, 1987; Higgins, Bond, 
Klein, & Strauman, 1986) have emphasized that discrepancies between one’s 
actual and one’s ideal self relate to disappointment and dissatisfaction. 

Markus and Wurf (1987) also proposed that a central feature of the self-con- 
cept is its motivational function. For example, they theorized that the self one 
would like to be operates as an incentive. In a similar vein, Markus and Nurius 



Correspondence concerning this article should be addressed to Carole K. Holahan, Department 
of Kinesiology and Health Education, University of Texas at Austin. Bellmont Hall 222 (D3700), 
Austin, TX 78712. Electronic mail may be sent to c.holahan@mail.utexas.edu. 

Reprinted from Psychology and Aging, Vol. 14, No. 2, 1999. 



6 



9 





(1986) proposed that one’s possible selves may be construed as “the cognitive 
component” of motivation (p. 954). Higgins (1987) likewise suggested that frus- 
tration from unfulfilled desires underlies the motivational aspect of the 
actual- ideal self discrepancy. The negative emotion cued by life regrets, in turn, 
motivates further efforts to cognitively undo aversive events (Roese & Olson. 
1995). 

Individuals’ reflections on the self that might have been may be especially 
salient at the midlife transition. For example, Helson (1992) has found midlife 
to be an important time for the revision of women’s self-conceptualizations. 
Levinson’s (1978, 1996) theories of the male and female life cycles also empha- 
size midlife as a critical time for reassessment. Moreover, the consequences of 
life regrets may be especially apparent in the aging years. In a study of older 
persons, Erikson and his colleagues (Erikson, Erikson, & Kivnick, 1986) found 
that many of the individuals in their study were engaged in a positive reassess- 
ment of their earlier life in aging, which enabled them to successfully balance 
feelings of ego integrity and a sense of despair. 

The purpose of the present research was to investigate the relationship of a 
midlife appraisal of having lived up to one’s intellectual abilities with (a) life 
satisfaction assessed approximately one decade later and (b) both life satisfac- 
tion and life choices individuals would make differently assessed three decades 
later. Participants were 188 men and 195 women in the Terman Study of the 
Gifted (Terman, 1925). The midlife appraisal was made when participants, who 
were an average of 49 years of age, answered a question asking whether they 
had lived up to their intellectual abilities. Life satisfaction in several areas was 
indexed at an average age of 61. In addition, overall life satisfaction and life 
choices that participants would make differently if they could live their lives 
again were measured at an average age of 80. 

The Terman Study of the Gifted is the oldest and most complete study of the 
human life cycle (see Holahan, 1984, 1988; Holahan & Sears, 1995; Oden, 

1968; P. S. Sears & Barbee, 1977; R. R. Sears, 1977; Terman, 1925). It was 
begun in 1921 by Lewis Terman and eventually included 1,528 gifted children. 
The average age of the core sample selected in 1921 was 11. The study has been 
extended over the ensuing decades, with the latest data collection in 1996. 

The issue of living up to one’s intellectual abilities is particularly relevant to 
a sample of individuals selected for their high intelligence. The criterion for 
selection was a minimum IQ of 135. The individuals in the Terman sample 
achieved high levels of education relative to others at that time, with about 76 
percent of men and 70 percent of women graduating from college, compared 
with eight percent in the general population (Holahan & Sears, 1995). Over the 
years, the data collections in the Terman study have emphasized achievement. 
They have routinely requested information on education and career pursuit, as 
well as honors received in occupational, community, or other contexts. 



10 



7 




Study 1 



From the Terman study archival data, there was an opportunity to examine 
the participants’ midlife assessments of their having lived up to their intellectual 
abilities as reported in 1960. The archives also afforded an opportunity to view 
the affective correlates of this appraisal. In 1972, the study participants were 
asked to evaluate their life satisfaction in the occupational and family spheres, 
as well as their joy in living. The areas of work and family were chosen for 
analysis in the present study because of their central roles in theories concerning 
the development of a life structure (e.g., Erikson et al., 1986; Levinson, 1978). 

Based on conceptualizations of the affective correlates of the “self that might 
have been” (Higgins, 1987; Higgins et al., 1986; Markus & Wurf, 1987), we 
hypothesized that individuals who reported in midlife that they had not lived up 
to their intellectual abilities would score lower on all three of these indexes of 
life satisfaction 12 years later. Based on traditional gender roles and the differ- 
ential occupational opportunities open to the Terman men and women (Holahan 
& Sears, 1995), it was also expected that the subjective assessment of having 
lived up to one’s intellectual potential would be more tied to satisfaction with 
occupational success for the men than for the women. In addition, reasoning 
that life regrets would predict subsequent perceived goal-related discrepancies, 
we hypothesized that individuals who reported in midlife that they had not lived 
up to their intellectual abilities would score lower on indexes of life satisfaction 
discrepancy (i.e., satisfaction adjusted for goal importance) pertaining to occu- 
pational and family life and joy in living 12 years later. Moreover, we predicted 
that these relations would be independent of prior mental health and objective 
achievement and would hold controlling for these variables. 

Method 



Participants 

Some overall selection criteria pertained to both Studies 1 and 2. Participants 
in both studies were members of the Terman Study of the Gifted who responded 
to a question tapping the self-appraisal of having lived up to their intellectual 
abilities in the 1960 survey when they were an average of 49 years of age. To 
ensure a more homogenous age sample and consistency across the two studies, 
participation in both studies was restricted to individuals who were at least 75 
years of age at the 1992 survey and who had responded in 1972. The age 
restriction excluded 42 younger members of the Terman Study from the present 
analyses. These procedures resulted in highly comparable samples across the 
two studies. Among individuals who responded to the 1992 survey, the focus of 
Study 2,91 percent had also responded to the 1972 survey, the focus of Study 1. 

It should be noted that attrition has made the sample more select in some 



8 



11 




areas. Participants who have remained in the study are similar to those who left 
the study in terms of IQ and socioeconomic status of family of origin. However, 
the continuing sample has more education, better health, and greater occupation- 
al success (men only) than those who left the study (Holahan & Sears, 1995). It 
would seem, however, that by restricting the range in the sample such attrition 
would have made the analyses reported here more conservative. 

The maximum number of participants for whom data were available for the 
present analyses was 383 (188 men and 195 women). The number of partici- 
pants in some analyses was less than 383 due to missing values on some vari- 
ables. 

The return rate for the 1972 survey was 7,596. At the 1972 survey, partici- 
pants were a mean age of 61. 

Measures 

Appraisal of having lived- up. In 1960, the participants were asked, “On the 
whole, how well do you think you have lived up to your intellectual abilities? 
Don’t limit your answer to economic or vocational success only.” Response 
options varied along a 5 -point scale, ranging from 1 (Consider my life largely a 
failure) to 5 (Fully). Responses were coded into two categories: not lived up 
versus lived up. The not lived-up category included responses of “consider my 
life largely a failure,” 1 “far short,” and “considerably short.” The lived-up cate- 
gory included “reasonably well” and “fully.” 

Life satisfaction . In 1972, the participants were asked to rate their satisfac- 
tion with their life experience in several domains. Three life satisfaction 
domains were analyzed in the present study: occupational success, family life, 
and joy in living. In each area, participants were asked to check one of the fol- 
lowing five response alternatives: 1 = found little satisfaction in this area: 2 = 
on the whole, somewhat dissatisfied; 3 = had a mixed experience but am not 
discontented; 4 = had a satisfactory degree of success; and 5 = had excellent 
fortune in this respect. (For examples of research using these items, see Holahan 
& Sears, 1995; P. S. Sears & Barbee, 1977; R. R. Sears, 1977.) 

Life satisfaction discrepancy. In 1972, the participants were also asked to 
rate the importance of their life goals in the plans they made for themselves in 
early adulthood in each of the life satisfaction domains of occupational success, 
family life, and joy in living. In each domain, participants were asked to check 
one of the following four response alternatives: 1 = less important to me than to 
most people, 2 = looked forward to a normal amount of success in this respect, 

3 = expected a good deal of myself in this respect, and 4 = of prime importance 
to me, was prepared to sacrifice other things for this. A life satisfaction discrep- 
ancy score in each of the three domains was computed by subtracting the impor- 
tance of each domain from its satisfaction score. The discrepancy scores had a 



1 Only 1 participant chose the response option “consider my life largely a failure.” 





range of -3 to 4, with lower scores representing less satisfaction relative to goal 
importance. 

Occupational level Participants’ occupations were rated according to 
Duncan’s socioeconomic index (Miller & Miller, 1977). Occupations listed in 
1972 were used primarily in assigning occupational ratings. Where necessary, 
information from other surveys (1960 or 1977) was used to substantiate or clari- 
fy ratings. In the present research, occupations were coded into one of three lev- 
els: 1 = lower-level occupations (Duncan scores of 0-610), 2 = administrative 
and minor professional (Duncan scores of 617-771), and 3 = major professional 
(Duncan scores of 774-960). Homemakers were classified in the first level. 

Mental health . In 1 960, cumulative ratings of mental health were made for 
each Terman study participant (see Oden, 1968). All ratings utilized multiple 
sources of information from each follow-up survey since 1940, such as “person- 
al conferences with the subject or members of his family by the research staff, 
responses by the subjects to questionnaire inquiry, reports by parents and spous- 
es of the subjects, and letters or other personal communications from the sub- 
jects or other qualified informants” (Oden, 1968, p. 8 ). Based on this informa- 
tion, each participant’s mental health was coded into one of three levels, with 
higher scores reflecting poorer adjustment: 0 = satisfactory adjustment (e.g., 
only minor and realistic anxieties), 1 = some difficulty in adjustment (e.g., psy- 
chiatric or other help sought), and 2 = serious difficulty in adjustment (e.g., 
interference with marriage, occupation, or social relationships or psychiatric 
hospitalization). (For a recent application of these mental health data, see Martin 
etal., 1995.) 



Results 



Predictors of Lived Up 

Initially, we examined three variables that might be predictively related to 
the lived-up self-appraisal: the three-level Duncan socioeconomic index, gender, 
and the three-level 1960 cumulative measure of mental health. A2x3x2x3 
(Lived Up x Occupational Level x Gender x Mental Health) hierarchical log lin- 
ear analysis contained the four main effects and the following pairwise interac- 
tions: Lived Up x Occupational Level, Occupational Level x Gender, Lived Up 
x Gender, and Lived Up x Mental Health. The goodness of fit for the model was 
satisfactory, G 2 (22) = 28.35, p = .164, N = 383. 

Follow-up chi-square analyses indicated that a greater proportion of individ- 
uals in higher-level occupations responded that they had lived up to their intel- 
lectual abilities, C 2 (2, N = 383) = 8.88,/? < .05. In addition, a greater proportion 
of men than women said that they had lived up to their intellectual abilities 
(70.2 percent of men as compared with 59.5 percent of women), c 2 (l, N = 383) 
= 4.82, p < .05. Finally, the proportion of lived-up responses was positively 
associated with the mental health rating, c 2 (2, N = 383) = 1 1.26,/? < .01. 




Life Satisfaction in 1972 

Participants’ satisfaction with their experience in several domains as reported 
in 1 972 was analyzed as a function of their report in 1 960 of having lived up to 
their intellectual abilities and by gender. A 2 x 2 (Lived Up x Gender) multivari- 
ate analysis of covariance (MANCOVA) was run with satisfaction with occupa- 
tional success, family life, and joy in living as the dependent variables, and 
occupational level and mental health as covariates. The MANCOVA was signifi- 
cant for lived up (Wilks lambda = .90), F(3, 296) = 10.37,/? < .001. There was 
not a significant multivariate effect for gender or for the Lived Up x Gender 
interaction. In follow-up univariate analyses of covariance (ANCOVAs), there 
was a significant lived-up effect for satisfaction with occupational success, F(l, 
311) = 26.89, MSE = .86,/? < .001; satisfaction with family life, F(l, 333) = 
4.87, MSE = .92,/? < .05; and joy in living, F(l, 332) = 13.85, MSE = .82,/? < 
.001, with means higher for the group reporting having lived up to their abili- 
ties. Table 1 presents the means on the three variables for men and women sepa- 
rately. 

In the univariate ANCOVAs, there was a significant gender effect only for 
satisfaction with occupational success, F(l, 311) = 5.04, MSE = .86,/? < .0, with 
men reporting higher satisfaction. In addition, there was a significant Lived Up 
x Gender interaction only for satisfaction with occupational success, F(l, 311) = 
5.27, MSE = .86, p < .05, with men who reported they had lived up to their abil- 
ities particularly satisfied with their occupational success. Post hoc /-tests con- 
ducted within gender groups demonstrated that the lived-up effect for satisfac- 



Table 1 

Mean Satisfaction With Occupational Success, Family Life, and 
Joy in Living as Reported in 1972 for Men and Women 
Who Reported in 1960 That They Had or Had Not Lived Up to 

Their Intellectual Abilities 



Domain 


M 


Not lived up 
SD 


n 


M 


Lived up 
SD 


n 


Occupational success 


3.40 


0.97 




4.25 


0.74 


n 


Family life 


3.92 






4.33 


0.93 


is 


Joy in living 


3.71 


0.89 


49 


4.15 


0.85 


118 


Occupational success 


3.26 


1.08 


Women 

57 


3.63 


1.05 


91 


Family life 


4.00 


1.14 


67 


4.22 


0.91 


104 


Joy in living 


3.76 


1.22 


66 


4.24 


0.82 


105 



14 



11 








tion with occupational success was significant for both gender groups, with the 
effect stronger for men, t(\61) = 6.22, p < .001, than for women, f(146) = 2.03, 
p < .05. 

Life Satisfaction Discrepancy in 1972 

Life satisfaction discrepancy (i.e., life satisfaction-goal importance) was ana- 
lyzed in a 2 x 2 (Lived Up x Gender) MANCOVA. Life satisfaction discrepancy 
pertaining to occupational success, family life, and joy in living were dependent 
variables, and occupational level and mental health were covariates. The MAN- 
COVA was significant for lived up (Wilks lambda = .95), F(3, 272) = 4.65, p < 
.01, and for gender (Wilks lambda = .95), F(3, 272) = 3.96, < .01, but not for 
the Lived Up x Gender interaction. The group that reported having lived up to 
their intellectual abilities showed more favorable scores (i.e., less discrepancy in 
the direction of negative self-assessment) than the not lived-up group, and men 
showed more favorable scores than did women. 

In follow-up univatiate ANCOVAs, there was a significant lived-up effect for 
life satisfaction discrepancy, with occupational success, F(l, 297) = 5.23, MSE 
= 1.24 ,p < .05; family life, F(l, 322) = 5.70, MSE = 1.06, p < .05; and joy in 
living, F(l, 310) = 10.65, MSE = .84, p < .01. The group that reported having 
lived up to their abilities had more favorable scores on all three variables. There 
was a significant gender effect for life satisfaction discrepancy only with family 
life, F(l, 322) = 6.82, MSE = 1.06, p < .01, with men showing more favorable 
scores than women. There were no significant univariate effects for the Lived 
Up x Gender interaction. 



Study 2 

The Terman study archives also afforded an opportunity to view longer-term 
affective and motivational correlates of the participants’ 1960 assessment of 
their appraisal of having lived up to their intellectual abilities. In 1992, study 
participants were asked about their overall life satisfaction and about what 
choices they would make differently if they could live their lives again. 

Based on conceptualizations of the long-term affective correlates of the self 
that might have been (Higgins, 1987; Markus & Wurf, 1987), we hypothesized 
that individuals who reported that they had not lived up to their intellectual abil- 
ities in midlife would score lower on overall life satisfaction 30 years later. 
Moreover, based on the view that the negative emotion cued by a negative com- 
parison with what might have been motivates efforts to cognitively undo the 
aversive event (Roese & Olson, 1995), we hypothesized that individuals who 
reported that they would make different choices in either the work or family 
domains would report lower levels of life satisfaction than those who would not 
make any choices differently. Moreover, we predicted that these relations would 

15 



12 




be independent of prior mental health, objective achievement, and general 
health in 1992 and would hold controlling for these variables. 

Based on conceptualizations of the motivational correlates of the self that 
might have been (Higgins, 1987; Markus & Nurius, 1986; Markus & Wurf, 
1987), we hypothesized that the tendency to make different choices in the work 
or family domains in contrast to the tendency to change nothing would be pre- 
dicted by the midlife assessment of having lived up to intellectual abilities 30 
years earlier. Finally, we tested an integrative model of the associations among 
the 1960 lived-up variable, life satisfaction discrepancies in 1972, and overall 
satisfaction in 1992 in a structural equation model (SEM) using LISREL 8 
(Joreskog & Sorbom, 1993). Reasoning that an earlier self-appraisal would 
operate through subsequent self-referent thought in predicting future outcomes, 
we hypothesized that the relationship between the 1960 self-appraisal of having 
lived up to intellectual abilities and overall satisfaction in 1992 would be medi- 
ated by life satisfaction discrepancy in 1972. 

Method 



Participants 

Participants in Study 2 were members of the Terman Study of the Gifted 
who responded to the follow-up survey in 1992 and who also met the overall 
selection criteria described in Study 1. The return rate for the 1992 survey was 
769. At the 1993 survey, participants ranged in age from 75 to 88 years, with a 
mean age of 80. Due to missing data, the maximum sample size in Study 2 
analyses was 365 (178 men and 187 women). 

Measures 

Overall life satisfaction . In 1992, participants were asked, “All things con- 
sidered, how satisfied are you with your life these days? ” 2 Response options 
varied along a 9-point scale, ranging from 1 (completely dissatisfied) to 9 (com- 
pletely satisfied). Single items indexing global life satisfaction have been used 
extensively in survey research and have acceptable psychometric characteristics 
(see Campbell, Converse, & Ropers, 1976; Sauer & Warland, 1982). 

Alternative life choices. In 1992, the participants were asked in an 
open-ended question: “Looking back over your whole life what choices would 
you make differently?” Responses had been content coded earlier by Terman 
study research staff, who were blind to participants’ appraisal of having lived up 



2 The correlation of overall life satisfaction in 1992 with the three satisfaction scores in 1972 
was low to moderate (satisfaction with occupation, family life, and joy in living was .32, .18, and 
.34, respectively), making stability of life satisfaction less plausible as an alternative explanation 
for the study findings. 




13 





to their intellectual abilities in 1960 and to the present hypotheses. Consistent 
with the present emphasis on the work and family domains, responses for analy- 
sis were selected from three content categories: no change, family, and work. 
The no change category included responses such as “no changes,” “no regrets,” 
and “quite satisfied with choices.” The work category included responses such 
as “chose wrong occupation,” “would have liked a different career,” and 
“should have aimed higher in career.” The family category included responses 
such as “would have chosen different mate,” “might have tried harder to be 
married,” and “would spend more time in family relationships.” 

General health . In 1992, participants were asked a question concerning their 
general health since 1986. Response options varied along a 5-point scale, rang- 
ing from 1 (very poor) to 5 (very good). A two-level (good vs. poorer) health 
variable was defined as follows: Individuals who reported “good” or “very 
good” health (69.89 percent) were included in a good health group; individuals 
who reported “very poor,” “poor,” or “fair” health (30.2 percent) were included 
in a poorer health group. Self-ratings of general health have good construct 
validity and tend to be positively correlated with physicians’ ratings (LaRue, 
Bank, Jarvik, & Hetland, 1979). Moreover, such ratings predict mortality 
beyond predictions based on objective indicators, such as physicians’ assess- 
ments from physical examinations (Idler & Karl, 1991). 

Results 



Overall Satisfaction in 1992 

The relationship of the 1960 lived-up variable with 1992 overall satisfaction 
was analyzed in a 2 x 2 x 2 (Lived Up x Health x Gender) ANCOVA. 
Occupational level and 1960 mental health were used as covariates. The 1992 
measure of overall life satisfaction was the dependent variable. The analysis was 
significant for lived up, F(l, 355) = 10.71, MSE = 2.25, p < .001, and for 
health, F(l, 355> = 27.64, MSE = 2.25, p < .001 . The Lived Up x Health inter- 
action was nonsignificant. For the lived-up factor, the mean of the group report- 
ing having lived up to their intellectual abilities was higher than that of the not 
lived-up group Ms = 6.95 and 6.30, respectively). For health, the satisfaction of 
participants reporting good health was higher than that of participants reporting 
poorer health (Ms = 6.98 and 5.99, respectively). 

Alternative Life Choices 

To investigate the relation of appraisal of having lived up to intellectual abil- 
ities in 1960 with alternative life choices as reported in 1992, a 2 x 3 x 2 (Lived 
Up x Choice x Gender) hierarchical log linear analysis was run as a saturated 

17 



14 




Table 2 

Distribution of Alternative Life Choices Reported in 1992 for Men 
and Women Who Reported in 1960 That They Had of Had Not 
Lived Up to Their Intellectual Abilities 



Choice 


n 


Men 

% 


n 


Women 

% 


No change 


10 


Not lived up 

27.8 


23 


53.5 


Work 


19 


52.8 


9 


20.9 


Family 


7 


19.4 


11 


25.6 


No change 


47 


Lived up 

72.3 


38 


56.7 


Work 


10 


15.4 


11 


16.4 


Family 


8 


12.3 


18 


26.9 



model. The three choice categories selected for analysis were no change, alter- 
native choices in the work domain, and alternative choices in the family 
domain. Both the Lived Up x Choice interaction, c 2 (2, N = 21 1) = 13.09,/? < 
.01, and the Choice x Gender interaction, c 2 (2, N = 211) = 6.31,/? < .05, were 
significant. In addition, the three-way interaction (Lived Up x Choice x Gender) 
was signif icant, C 2 (2, N = 21 1) = 7.80, p < .05. 3 Table 2 gives the distribution 
of alternative choices across the three choice categories by gender across levels 
of the lived-up variable. 

Follow-up chi-square analyses within gender groups indicated that men who 
felt they had not lived up to their intellectual abilities, compared with men who 
felt they had lived up to their abilities, were more likely to say they would make 
life choices differently, c 2 (2, N = 101) = 20.22,/? < .001. The predominant 
response of men who did not live up to their abilities was to alter life choices in 
the work domain. For women, in contrast, the responses of those who did and 
those who did not live up to their abilities were comparably distributed across 
the no change, work, and family categories, c 2 (2, N = 110) = .36, ns. 

Relation of Satisfaction to Alternative Life Choices 

The relation between alternative life choices and life satisfaction was investi- 
gated in a 2 x 2 x 2 (Choice x Health x Gender) ANCOVA, with occupational 



3 A small number of participants (n = 11) gave responses in both the work and family cate- 
gories. An additional log linear analysis was run with the responses of these individuals included 
in a choice category. The results were essentially the same as those for the three-level choice cate- 
gory as reported above. 



15 









level and 1960 mental health as covariates. The choice factor was defined as 
stating no changes would be made versus stating that different choices would be 
made pertaining to either work or family. The 1992 measure of overall life satis- 
faction was the dependent variable. The results were significant for choice, F(l, 
204) = 17.26, MSE = 2.10,/? < .001, and health, F(l, 204) = 15.41, MSE = 
2.10,/? < .001. The no-change group reported higher satisfaction than the 
change group (Ms = 7.29 and 6.33, respectively). In addition, the good-health 
group reported higher satisfaction than the poorer-health group (Ms = 7.06 and 
5.96, respectively). 

An Integrative Longitudinal Model 

We tested an integrative longitudinal model of the associations among the 
1960 lived-up variable, life satisfaction discrepancies in 1972, and overall satis- 
faction in 1992 in a latent variable SEM using LISREL 8 (Joreskog & Sorbom, 
1993). The 1960 appraisal of having lived up to intellectual abilities (coded 
dichotomously as “not lived up” = 0, “lived up” =1) was an exogenous vari- 
able, and overall satisfaction in 1992 was an outcome variable (both measured 
with single indicators). Life satisfaction discrepancy in 1972 (measured with 
three indicators — life satisfaction discrepancy pertaining to occupational suc- 
cess, family life, and joy in living) was included as a mediating variable 
between the 1960 self-appraisal and 1992 satisfaction. To provide a metric for 
the latent constructs and to identify the measurement model, the first indicator 
loading for each latent construct was set to 1.0 in the unstandardized solution 
for the model. Variance-covariance matrices were used in the LISREL analyses. 

The results of the LISREL test of the hypothesized model are presented 
graphically in Figure 1. The model provides a good fit to the data, overall 
C 2 (4, N = 313) = 2.55, p > .60; adjusted GFI = .99. Based on examination of the 
modification indices, a parameter reflecting correlation between the unique vari- 
ances for the measures of life satisfaction discrepancy pertaining to family life 
and joy in living was included in the model. All parameter estimates for the 
measurement model of the life satisfaction discrepancy latent construct and all 
parameter estimates in the structural model are significant at the .01 level. As 
predicted, the relationship between the 1960 self-appraisal of having lived up to 
intellectual abilities and overall satisfaction in 1992 was mediated by the life 
satisfaction discrepancy in 1972. The simple correlation in the model between 
the 1 960 lived-up appraisal and 1 992 overall satisfaction was significant, 
r = .22,/? < .01. However, consistent with the mediational interpretation, when a 
direct path between the 1960 lived-up variable and 1992 overall satisfaction is 
added to the model, model fit is not significantly improved, 

C 2 (l, N = 313) = 2.25,/? > .10. 



16 





Figure 1 

iii 

nL' \L" "si" 



| Occupation | | Family | | Joy in Living 




Results of the LISREL test (standardized estimates) of the structural equation and 
measurement models for an integrative longitudinal model. Latent constructs are 
shown in ellipses, and observed variables are shown in rectangles, f indicates a 
parameter set to 1 .0 in the unstandardized solution: I represents unique variance in 
the three indicators of life satisfaction discrepancy. * p < .01. 



General Discussion 

Consistent with conceptualizations of the affective correlates of the self that 
might have been (Higgins, 1987; Higgins et al., 1986; Markus & Wurf, 1987), 
we found in Study 1 that a self-appraisal of having lived up to one’s intellectual 
potential in midlife predicted satisfaction in the work and family domains and 
joy in living 12 years later. In Study 2, we found that the same midlife appraisal 
also predicted overall life satisfaction three decades later. 

Specifically, individuals who reported that they had lived up to their intellec- 
tual abilities were more satisfied than were individuals who reported that they 
had not lived up to their abilities in each of the life domains assessed 12 years 
later and in overall satisfaction three decades later. Moreover, in an integrative 
structural equation model, we showed that the relation between a midlife self- 
appraisal of having lived up to intellectual abilities and overall satisfaction at 
age 80 was mediated by satisfaction discrepancy at age 61. 

Consistent with conceptualizations of the motivational correlates of the self 



20 



17 





that might have been (Higgins, 1987; Markus & Nurius, 1986; Markus & Wurf, 
1987), we found in Study 2 that individuals’ self-appraisals of having lived up 
to their intellectual abilities in midlife were related significantly to life choices 
they would make differently as reported three decades later. Individuals who 
reported that they had lived up to their intellectual abilities were more likely to 
say that they would not make any life choices differently. In contrast, individu- 
als who reported that they had not lived up to their intellectual abilities were 
more likely to say that they would make different life choices in the work or 
family domains. 

In Study 2, we also found that the life choices individuals would make dif- 
ferently were related significantly to overall life satisfaction. Although these 
correlational findings do not demonstrate direction of effect, they are consistent 
with the view that the negative emotion cued by an initial negative comparison 
motivates further efforts to cognitively undo the aversive event (Roese & Olson, 
1995). Individuals who reported that they would not make any life choices dif- 
ferently experienced more overall life satisfaction than did individuals who 
would make different life choices in the work or family domains. These findings 
reflect the significance of life regrets in the aging years (Erikson, Erikson, & 
Kivnick, 1986). They also may reflect the valence of unfinished business 
(Savitsky, Medvec, & Gilovich, 1997), because the choice responses over- 
whelmingly indicated regrets over omissions rather than actions taken (see also 
Hattiangadi, Medvec, & Gilovich, 1995). 

Congruent with traditional gender role norms, we found in Study 1 that men 
who reported that they had lived up to their intellectual abilities at midlife had 
particularly high occupational satisfaction 12 years later. Also, a slightly greater 
proportion of men than women said that they had lived up to their intellectual 
potential. In Study 2, more men than women who reported they had not lived up 
to their abilities responded that they would make work choices differently, and 
more men than women who reported they had lived up responded that there 
were no choices they would make differently. The pattern of responses in both 
studies reflects the vastly different opportunity structures confronting the men 
and women of the respondents’ generation. Case materials in the Terman study 
suggest that these differences in opportunities were perceived by the Terman 
women (see Holahan, 1994; Holahan & Sears, 1995). Overall, the Terman men 
experienced considerable occupational success. Although the women’s occupa- 
tional achievements were superior to those of women of their cohort, they were 
modest in comparison with those of the Terman men (for more information on 
the Terman sample, see Holahan & Sears, 1995). 

Some cautions should be noted in interpreting these results. Common 
method variance across measures (i.e., self-report questionnaires) may have con- 
tributed to the linkages between perceived regrets and life satisfaction (see 
Lecci, Okun, & Karoly, 1 994). In addition, the findings of the present study are 
evidence of correlation only. Further, the Terman sample is uniquely advan- 



18 



21 




taged, and attrition has made the sample somewhat more select in the areas of 
education and, for the men, occupational success (Holahan & Sears, 1995). 

In summary, the present findings reflect the developmental significance of 
midlife self-appraisals (see Nelson, 1992; Levinson, 1978). They may also 
reflect the stability of adult personality, indicated by both trait approaches (e.g., 
Costa & McCrae, 1994) and studies of self-concept consistency (see Swann, 
1997). However, although our results show a large amount of consistency over 
time, they should not be interpreted as suggesting that revision and life change 
after midlife are impossible. In fact, such revision can be accomplished either 
behaviorally or cognitively (see Nelson, 1992; Landman et al., 1995, for exam- 
ples). 



References 

Campbell, A., Converse, P. E., & Rogers, W. I. (1976). The quality of American life. New York: Sage. 

Costa, P. T., Jr., & McCrae, R. R. (1994). Set like plaster? Evidence for the stability of adult personality. In T. 
F. Heatherton & J. L. Weinberger (Eds.). Can personality change? (pp. 21-40). Washington, DC: 
American Psychological Association. 

Erikson, E. H., Erikson, J. M., & Kivnick, H. Q. (1986). Vital involvement in old age. New York: Norton. 

Hattiangadi, N., Medvec, V. H., & Gilovich, T. (1995). Failing to act: Regrets of Terman’s geniuses. 
International Journal of Aging and Human Development , 40, 175-185. 

Helson, R. (1992). Women’s difficult times and the rewriting of the life story. Psychology of Women 
Quarterly , 16. 331-347. 

Higgins, E. T. (1987). Self-discrepancy: A theory relating self and affect. Psychological Review, 94, 319-340. 

Higgins, E. T., Bond, R. N., Klein, R., & Strauman, T. (1986). Self-discrepancies and emotional vulnerability: 
How magnitude, accessibility, and type of discrepancy influence affect. Journal of Personality and 
Social Psychology , 51.5-15. 

Holahan, C. K. (1984). Marital attitudes over forty years: A longitudinal and cohort analysis. Journal of 
Gerontology, 39, 49-57. 

Holahan, C. K. (1988). Relation of life goals at age 70 to activity participation and health and psychological 
well-being among Terman’s gifted men and women. Psychology and Aging. 3. 286-291. 

Holahan, C. K. (1994). Women’s goal orientations across the life cycle: Findings from the Terman Study of 
the Gifted. In B. Turner & L. Troll (Eds.). Women growing older: Psychological perspectives (pp. 

35-67). Thousand Oaks, CA: Sage. 

Holahan, C. K., & Sears. R. R. (1995). The gifted group in later maturity. Stanford. CA: Stanford University 
Press. 

Idler, E. L., & Kasl, S. (1991). Health perceptions and survival: Do global evaluations of health status really 
predict mortality? Journal of Gerontology, 46, 55-65. 

Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: User ’s guide. Chicago: Scientific Software International. 

Landman, J., & Manis, J. D. (1992). What might have been: Counterfactual thought concerning personal deci- 
sions. British Journal of Psychology. 83. 473-477. 

Landman, J., Vandewater, E. A., Stewart, A. J., & Malley, J. E. (1995). Missed opportunities: Psychological 
ramifications of counterfactual thought in midlife women. Journal of Adult Development , 2. 87-97. 

LaRue, A., Bank, L., Jarvik, L., & Hetland, M. (1979). Health in old age: How do physicians’ ratings and 
self-ratings compare? Journal of Gerontology, 34, 687-691. 

Lecci, L., Okun, M. A., & Karoly, P. (1994). Life regrets and current goals as predictors of psychological 
adjustment. Journal of Personality and Social Psychology , 66, 731-741. 

Levinson, D. J. (with Darrow, C. N., Klein, E. B., Levinson, M. H., & McKee, B.). (1978). The seasons of a 
man 's life. New York: Knopf. 

Levinson, D. J. (with Levinson. J. D.). (1996). The seasons of a woman s life. New York: Knopf. 




Markus, H., & Nurius. P. (1986). Possible selves. American Psychologist, 41, 954-969. 

Markus, H., & Wurf. E. (1987). The dynamic self-concept: A social psychological perspective. Annual Review 
of Psychology. 38. 299-337. 

Martin, L. R., Friedman, H. S., Tucker, J. S., Schwartz. J. E., Criqui, M. H., Wingard, D. L.. & Tomlinson- 
Keasey. C. (1995). An archival prospective study of mental health and longevity. Health Psychology , 14, 
381-387. 

Miller, W. E., & Miller, A. H. (1977). The CPS 1976 American National Election Study: Notes, frequencies, 
addendum and questionnaires. Unpublished manuscript. Inter-University Consortium for Political and 
Social Research. Ann Arbor, MI. 

Oden, M. H. (1968). The fulfillment of promise: 40-year follow-up of the Terman gifted group. Genetic 
Psychology Monographs. 77. 3-93. 

Roese, N. J., & Olson, J. M. (Eds.), (1995). Counterfactual thinking: A critical overview. In N. J. Roese & J. 
M. Olson (Eds.), What might have been: The social psychology of counterfactual thinking (pp. 1-55). 
Mahwah, NJ: Erlbaum. 

Ross, M., & Conway, M. (1986). Remembering one’s own past: The construction of personal histories. In R. 
M. Sorrentino & E. T. Higgins (Eds.). Handbook of motivation and cognition: Foundations of social 
behavior (pp. 122-144). New York: Guilford Press. 

Sauer, W. J., & Warland, R., (1982). Morale and life satisfaction. In D. J. Mangen & W. A. Peterson (Eds.), 
Research instruments in social gerontology: Vol. 1. Clinical and social psychology (pp. 195-210). 
Minneapolis: University of Minnesota Press. 

Savitsky, K., Medvec, V. H., & Gilovich. T. (1997). Remembering and regretting: The Zeigamik effect and the 
cognitive availability of regrettable actions and inactions. Personality and Social Psychology Bulletin , 

23. 248-257. 

Sears, P. S., & Barbee, A. H., (1977). Career and life satisfaction among Terman’s gifted women. In J. C. 
Stanley, W. D. George, & C. H. Solano (Eds.). The gifted and the creative: A fifty-year perspective (pp. 
28-65). Baltimore: Johns Hopkins University Press. 

Sears, R. R, (1977). Sources of life satisfaction of the Terman gifted men. American Psychologist , 32, 

119-128. 

Swann, W. B. (1997). The trouble with change: Self-verification and allegiance to the self. Psychological 
Science. 8, 177-180. 

Terman, L. M. (with Baldwin, B. T., Bronson, E., DeVoss, J. C., Fuller, F., Goodenough, F. L., Kelley, T. L., 
Lima, M., Marshall, H., Raubenheimer, A. H., Ruch, G. M., Willoughby, R. L., Wyam, J. B., & Yates. D. 
H.), (1925). Mental and physical traits of a thousand gifted children: Vol. 1. Genetic studies of genius. 
Stanford. CA: Stanford University Press. 




20 




Consequences of How We Define and Assess 
Intelligence 

Wendy M. Williams , Cornell University 



The author considers some of the consequences to society of how intelligence is 
defined and assessed. First, the author reviews historical approaches to under- 
standing and measuring intelligence to clarify traditional points of view and 
current responses to these positions. Next, she describes a study of the 
Graduate Record Examination that illustrates the strengths and weaknesses of 
traditional approaches to assessing intelligence. She then describes a research 
program that defined, assessed, and trained intelligence from a different per- 
spective — the perspective of practical intelligence. The article closes by con- 
sidering directions for future research and thinking about a broader and more 
ecologically relevant conception of intelligence that would lead to new and 
potentially fruitful approaches to assessment and training. 

What are the consequences of how our society defines and measures intelli- 
gence? Virtually everyone in the United States has been affected by prevailing 
views on the definition and assessment of intelligence. However, many people 
have never stopped to consider the impact of this issue on their lives. In this 
article, I discuss how the conceptualization of intelligence prevalent in the sci- 
entific community affects all of us, especially school children and college stu- 
dents. In some cases, these effects can be evaluated as being good versus bad; in 
other cases, they simply create advantages for certain groups of people possess- 
ing certain profiles of abilities. Any definition of intelligence carries with it a 
value judgment about the attributes and performances that are most prized by 
the society. In this discussion, I consider the logical consequences of the value 
judgments about intelligence that are emphasized by our society. 

I begin by briefly reviewing background on historical approaches to under- 
standing and measuring intelligence; my intention is to clarify the traditional 
points of view and current responses to these positions. Next, I describe a study 
that illustrates the strengths and weaknesses of traditional approaches to assess- 
ing intelligence, by portraying the consequences to graduate-school applicants 
of the use of the Graduate Record Examination (GRE). I follow by describing a 
research program that defined, assessed, and trained intelligence from a differ- 
ent perspective, and I discuss the implications of this approach. I close by con- 
sidering directions for future research and thinking about the nature, definition, 
assessment, and training of intelligence. 



Correspondence concerning this article should be addressed to Wendy M. Williams, 

Department of Human Development, Cornell University, Ithaca, New York 14853. Electronic mail 
may be sent via Internet to wmw5@comell.edu. 

Reprinted from Psychology and Aging, Vol. 14, No. 2, 239-244, 1999. 



24 



21 





Historical Approaches to Defining and Measuring Intelligence 

A general definition of intelligence that most experts would accept is one 
that views intelligence as representing goal-directed, adaptive behavior. Two 
studies — one in 1921 and one in 1983 — asked experts to define intelligence 
(see Sternberg & Detterman, 1986). Common themes in the two groups of 
experts’ opinions were the importance of learning from experience and the abili- 
ty to adapt to the environment. In 1986, experts also mentioned the importance 
of people’s understanding and control of their own thinking processes. Despite 
these apparent similarities among experts in their views of intelligence, early 
attempts to define and measure intelligence followed quite different trajectories. 

The First Intelligence Tests 

Two different traditions in the study of intelligence date back to the late 1 9th 
century to work by Sir Francis Gallon and Alfred Binet. Gabon’s (1883) psy- 
chophysical view of intelligence emphasized low-level tasks that tapped physi- 
cal abilities in addition to mental abilities. Galton tested intelligence by measur- 
ing physical capabilities such as grip strength. However, scores on tests of grip 
strength and other physical abilities were not related to performance in school, a 
key domain in which people wanted to predict achievement (Wissler, 1901). 
Interest in Gabon’s views consequently waned. In some ways, however, Galton 
proved to be prescient: Later researchers, particularly Jensen (1982), discovered 
that when reliable measures (gathered using modem, reliable equipment) were 
developed to assess Gabon’s theory, they did correlate with the type of intelli- 
gence measured by Binet (discussed below) and others. However, following the 
early apparent disconfirmation by Wissler (1901), the Galtonian tradition 
remained largely unexplored for most of the 20th century; far more interest was 
shown in the perspective of Binet. 

Binet approached the problem of defining intelligence very differently. He 
saw intelligence as consisting of direction (knowing what to do and how), adap- 
tation (selecting a strategy for performing a task and monitoring one’s success), 
and criticism (knowing how to critique one’s work). In 1904, Binet and 
Theodosius Simon developed tests for the Paris school system that were 
designed to differentiate mentally defective children from children who were 
failing in school for other reasons. Binet’s stated goal was to measure the abili- 
ties to judge well, comprehend well, and reason well (Binet & Simon, 1916). He 
developed the concept of mental age, which defined a child’s intellectual per- 
formance compared with an average child of the same chronological age. 
Dividing mental age by chronological age (and multiplying by 100) results in a 
number called the IQ. 

The next major improvement to the intelligence test occurred when Lewis 
Terman revised Binet and Simon’s (1916) test, creating the Stanford-Binet 

25 



22 




Intelligence Scales (Terman & Merrill, 1937), a set of tests that is still widely 
used today. The Stanford-Binet contains the following types of items: verbal 
reasoning, consisting of vocabulary, comprehension, absurdities, and verbal 
relations; quantitative reasoning, consisting of number series and arithmetic 
word problems; figural/abstract reasoning, consisting of pattern analysis; and 
short-term memory, consisting of memory for sentences, digits, and objects. 
Another major series of intelligence tests in common use today, developed by 
David Wechsler (1981), differs somewhat from the Stanford-Binet test. The 
Wechsler intelligence scales provide three scores: verbal, based on such subtests 
as Vocabulary and Verbal Similarities; performance, based on such subtests as 
Picture Completions and Picture Ajrangements; and overall, which is a combi- 
nation of the verbal and performance scores. 

Binet’s work a century ago defined a basic prototype for an intelligence test 
that remains substantially the same today. Scientists and others who use intelli- 
gence tests in research today are thus tacitly adopting a view of intelligence 
defined by Binet. Those who use modem versions of intelligence tests may not 
always realize that the use of these carries with it certain assumptions about the 
nature of intelligence, for example, that intelligence consists of the ability to 
respond quickly (e.g., some subtests award bonus points for speed) and solve 
mathematical problems. However, whenever a test is used to measure an ability, 
there is an assumption being made that the way the test constructor conceptual- 
ized that ability is reasonable. If a student takes a traditional intelligence test, 
scores well, and is consequently deemed very intelligent, we must bear in mind 
that intelligence in this sense means reading quickly and solving mathematical 
problems, for example. For the purposes of this dicussion, it is important to 
remember the types of assumptions that underlie the use of the widely used 
intelligence tests. 

Psychometric Theories 

All of this early interest in measuring intelligence was associated with the 
development of psychometric theories of intelligence, which view intelligence 
largely as a map of the mind (Sternberg, 1990). An early psychometric theory 
was proposed by Charles Spearman in 1904. Spearman used factor analysis to 
divide intelligence into what he called “g,” or a single general factor, and multi- 
ple specific factors, each of which he called “s.” How exactly is an estimate of g 
for a specific person obtained? The person completes a test such as the Wechsler 
or Stanford-Binet; next, the scorer conducts a principal-component analysis, 
which results in a first major factor representing internally consistent informa- 
tion about the test-taker’s performance. This factor is called g. 

g was seen by Spearman as a type of intelligence that influenced perform- 
ance on all mental tests, whereas each s factor was thought to be involved in 
performance on a single type of test over and above the contribution made by g . 



26 



23 




Spearman saw the general factor, g, as being at the heart of intelligence, and 
many researchers today would still agree with him (e.g., Gottfredson, 1986, 
1996; Hunter & Schmidt, 1996). Subsequent theories concluded that the core of 
intelligence resided not in one factor but, rather, in multiple primary mental 
abilities (e.g., Cattell, 1971; Guilford, 1982; Thurstone, 1938; Vernon, 1971). 
These abilities included verbal comprehension, verbal fluency, inductive reason- 
ing, spatial visualization, number, memory, and perceptual speed. In general, the 
key to all psychometric theories of intelligence is that they propose specific 
structures of intelligence explaining the organization of construct. 

Information-Processing Theories of Intelligence 

Information-processing theories focus on how people think and reason with 
their knowledge. Jensen (1982) looked at choice reaction time (how quickly a 
person can decide which button to push on a box); Hunt (1978) looked at lexical 
access speed (how fast people can retrieve information about words, or recog- 
nize the differences between pairs of letters such as AA, Aa, AB, and aB). 
Sternberg (1977a, 1977b) studied individual differences in intelligence by look- 
ing at how people solve verbal analogies. Simon (1976) looked at even more 
complex types of reasoning, such as those involved in playing chess. All of the 
information-processing approaches share an emphasis on the process of thinking 
and reasoning, rather than on the actual structure of intelligence. 

Contemporary Systems Theories of Intelligence 

Two contemporary theorists, Howard Gardner and Robert Sternberg, have 
proposed alternative ways to think about intelligence. Their views attempt to 
explain both how intelligence enables an individual to think and reason and how 
intelligence is structured. Gardner’s (1983, 1993) theory of multiple intelli- 
gences proposed the existence of seven distinct intelligences, which can func- 
tion alone or can interact to produce overall intelligent behavior — linguistic, 
logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, and 
intrapersonal. Gardner sees these seven intelligences as originating in different 
portions of the brain. 

Sternberg’s (1985, 1988) triarchic theory of intelligence emphasized a set of 
relatively interdependent processes. This theory postulates the existence of three 
important aspects of intelligence: componential, referring to information-pro- 
cessing components underlying intelligent performance (planning, monitoring, 
and evaluating performance; implementing one’s plans; learning how to solve 
problems); experiential, which relates intelligence to experience; and contextual, 
which relates intelligence to everyday contexts (adaptation to, shaping of, and 
selection of environments). The systems theories of Gardner and Sternberg, as 



24 



27 




well as those of other researchers, are, obviously, quite broad, and they have 
been criticized as being difficult to test fully or potentially to disconfirm, or 
both. 

Implications for Assessment of the Different Theories of Intelligence 

Suppose that a person accepts the psychometric view of intelligence and 
believes that g, or the general factor, is the best way to conceptualize meaning- 
ful intellectual ability. In such a case, measuring intelligence means measuring g 
(as just described). If a person believes that what is most relevant is a specific 
factor used in a specific type of performance, a test can be devised that meas- 
ures success at this type of performance. However, if a person accepts a systems 
theory of intelligence, the task of assessing intelligence becomes quite different. 
In recognizing that intelligence is a complex process, systems theories necessar- 
ily define intelligence more broadly and make it more difficult to create a single 
test that would fairly measure intelligence as conceptualized within the systems 
view. 

Most of the assessments, particularly the standardized assessments, used in 
North American schools are grounded in a relatively psychometric view of intel- 
ligence. In fact, as discussed above, the intelligence tests in wide use today are 
actually quite similar to the original tests developed by Binet and Simon in 
1904. Tests such as the Scholastic Assessment Test (SAT) and Preliminary 
Scholastic Assessment Test (PSAT), the GRE, the Law School Admission Test 
(LSAT), and the Graduate Management Admission Test (GMAT), for example, 
all measure verbal and mathematical knowledge and reasoning. So far, there are 
no commercially available tests based on the systems view of intelligence that 
represent valid, reliable alternatives to the psychometric assessments currently 
in use. However, this does not mean that psychologists and educators should not 
evaluate the tests currently in use in order to advance the thinking about what 
constitutes intelligence and meaningful intellectual performance. In addition, by 
evaluating widely used tests and by determining where they may fall short, 
researchers may help to advance the development of better tests for future use. 

How Good Are Current Tests at Predicting Real-World Performance? 

I have discussed the origins of today’s intelligence tests and the different the- 
oretical perspectives that give rise to different types of intelligence tests. I have 
stated that intelligence tests were originally developed to predict school per- 
formance. But how good are the tests being used today? Do they predict mean- 
ingful aspects of intellectual performance in real-world environments? Are peo- 
ple who score high on intelligence tests more successful in general in their 
lives? 




The degree to which intelligence tests predict out-of-school criteria such as 
job performance, for example, is a controversial question. Some believe that 
there is little or no justification for using tests of cognitive ability for job selec- 
tion (McClelland, 1973). Others believe that cognitive ability tests are valid pre- 
dictors of job performance for a wide variety of job settings (Barrett & Depinet, 
1991) or even for all job settings (Schmidt & Hunter, 1981; see also Hawk, 
1986; Gottffedson, 1986). Suppose that one accepts the pro-test position that 
stresses both the link between intelligence test scores and real-world perform- 
ance and the fact that these scores are the best known predictors of job success. 
It is still the case that the majority of variance in real-world performance is not 
accounted for by intelligence test scores. 

The average validity coefficient between cognitive-ability tests and measures 
of job performance is about .2 (Wigdor & Gamer, 1982), meaning that test 
scores account for only four percent of the variance in job performance. The 
average validity coefficient between cognitive-ability tests and measures of per- 
formance in job training programs is about double (.4) that found for job per- 
formance itself, which suggests that the magnitude of prediction varies as a 
function of how comparable the criterion measure is to schooling. When the 
contexts are similar — as training is to sitting in a classroom — the prediction 
of cognitive ability is far greater. 

Hunter and Schmidt have argued that validity coefficients should be correct- 
ed for unreliability in test scores and criterion measures and for restriction of 
range caused by the fact that only high scorers are hired. They believe that mak- 
ing these corrections results in better estimates of the true relation between cog- 
nitive-ability test performance and job performance, by raising the average 
validity coefficient to the level of about .5 (see, e.g., Hunter & Hunter, 1984; 
Schmidt & Hunter, 1981). This .5 value is hypothetical and is not routinely 
obtained in practice. But even the figure of .5 means that intelligence scores 
account for only 25 percent of the variance in job performance (i.e., the square 
of .5). 

One might respond to the figure of 25 percent by thinking that traditional 
intelligence measures are not highly predictive and contain insufficient informa- 
tion to be of value in real-world decision making (e.g., in personnel hiring deci- 
sions). This is not necessarily true, however; Hunt (1995) has argued that sub- 
stantial savings to employers can result from the use of currently available psy- 
chometric tests of intelligence to screen and select job applicants, even if the 
validity coefficients are very small. On the other side of this controversy is 
McClelland (1973), who has questioned the validity of cognitive-ability testing 
for predicting real-world criteria such as job performance. McClelland has 
argued in favor of competency tests that more closely reflect job performance 
itself. 




26 




Regardless of which side one wishes to endorse, it is clear that between 75 
percent and 96 percent of the variance in real-world criteria such as job per- 
formance cannot be accounted for by individual differences in intelligence test 
scores. Thus, it would seem that scientists should be able to augment or improve 
upon the types of tests in use today, either by modifying current tests or by 
strengthening prediction by using additional tests that measure different types of 
aspects of intellectual performance. 

One common cognitive-ability test, the GRE, is widely used in selecting 
applicants to matriculate in graduate school. The GRE is a well-known psycho- 
metric test that can be factored into a g factor and several s factors. Not surpris- 
ingly, the GRE is correlated moderately with IQ scores and all other g-saturated 
tests. The graduate-school environment is clearly a scholastic one, so presum- 
ably the GRE should be able to predict who will succeed in this environment. 
But, as anyone who has attended graduate school knows, there is more involved 
in succeeding in graduate school than just book-smarts. I now review a study I 
conducted with Robert Sternberg that evaluated the empirical validity of the 
GRE as a predictor of success in a graduate program in psychology (Sternberg 
& Williams, 1997). 



Evaluating the GRE 

Graduate programs use a variety of predictors to select those applicants who 
best match the programs and who offer the most to the field. An important ele- 
ment of every student’s application is her or his score on the GRE. Students 
learn early that GREs are not to be taken lightly (literally or figuratively). 
Average scores are published in guides to graduate programs to help potential 
graduate students select “appropriate” programs before making an application. 
Some graduate programs list average scores of accepted students with the mate- 
rials they send students to help them decide whether they should bother to apply 
at all. Other programs have taken the scores seriously enough to use them in a 
quantitative formula to help make admissions decisions (Dawes, 1971, 1975). 
Many programs have either explicit cutoffs or tacit minima, meaning that appli- 
cants who receive scores below these levels are almost never admitted. 

Our study considered whether the GRE deserves its role, in light of its ability 
to predict who will succeed in graduate school. We evaluated the Verbal, 
Quantitative, and Analytical tests of the GRE, as well as the psychology 
subject-matter advanced achievement test. Our concern was with how the test is 
used, rather than with the test itself. The GRE has well-documented and ade- 
quate predictive validity with respect to certain criteria (see, e.g., Briel, O’Neill, 
& Scheuneman, 1993). The question we addressed was whether the criteria for 
which the GRE best predicts are the ones we care the most about. If the predict- 
ed criteria are secondary, then perhaps we need to seek additional forms of 
assessment. 




How Is the GRE Used? 



The admissions process is complex, and admissions committees must consid- 
er multiple factors. In the psychology department at Yale University, for exam- 
ple the site at which this research was conducted — GRE scores are just one of 
many factors considered. There are no explicit cutoffs, although the lower. the 
level of the scores, the more an applicant needs compensating factors to gain 
admission. Applicants with very low scores are almost never admitted, regard- 
less of compensating factors. At other institutions, GREs are used differently. In 
some fields at Cornell University, applications for admission are sorted upon 
arrival into four boxes, labeled “GRE Below 1200,” “1200 to 1300,” “1310 to 
1400,” and “Above 1400.” This procedure is deemed necessary because some 
departments receive well over 100 applications for only eight admission places 
(for example), and the departments lack the personnel and time to read every 
application equally closely. In addition, the admissions committee has found 
over the years that there is very little variance in the strength and quality of let- 
ters of recommendation and undergraduate grade point average (GPA), leaving 
the students’ personal statements and GREs as the main sources of variation. 

In past research on testing, Robert Sternberg and I described a factor called 
the publication reason, which increases reliance on tests such as the GRE (e.g., 
Sternberg, 1988; Williams et al., 1996). When average test scores such as GREs 
(and SATs, LSATs, GMATs, etc.) are published, there is pressure upon universi- 
ty personnel to keep these average scores high to remain competitive with other 
institutions in the public eye. However, if a department admits only students 
with high GREs, department personnel may come to believe that high GREs are 
essential for success in their graduate program, as everyone who succeeds has 
high GREs. If no students with low or moderate GREs are ever admitted, it 
becomes impossible to falsify the view that high GREs are essential for success. 

Another point admissions committees must consider is that many of the 
sources of financial support for entering students are university wide. The stu- 
dents who receive this financial aid are often selected solely on the basis of 
GREs, because there are no other criteria that can be directly and fairly com- 
pared across academic fields (i.e., all of the students nominated for such fellow- 
ships possess high GPAs and glowing letters of recommendation). It can, thus, 
be difficult for departments to secure funding for students with low GREs. 
Hence, GRE scores find their way into the selection process from several differ- 
ent angles. 

Why Study the GRE? 

In general, the GRE is used extensively in admissions and financial-aid deci- 
sion making, alongside other factors and types of information. It is also used by 
government agencies, such as the National Science Foundation, as one factor in 
awarding graduate fellowships. For students, the GRE and preparation for it are 
28 




expensive, time consuming, and potentially anxiety provoking. The GRE yields 
scores that are taken as indications of intellectual abilities, and students usually 
take these scores seriously. In addition — and important for the purposes of our 
discussion — the GRE may mispredict performance in a graduate program, 
unfairly disadvantaging some students and advantaging others. 

The GRE is based on conventional psychometric notions of abilities (dis- 
cussed above), which traditionally have tended to emphasize some abilities 
(e.g., verbal, quantitative, and analytical), arguably at the expense of other abili- 
ties (e.g., creative, practical; see Sternberg, 1985, 1988, 1996). The GRE is also 
an example of psychological theory put into practice and raises practical prob- 
lems of prediction or its lack thereof. Finally, if GRE scores are not sufficiently 
valid for the kinds of decisions for which they are used, mental contamination 
may result in their being used anyway, as long as they are available (T. D. 
Wilson & Brekke, 1994). In other words, knowing a student’s GRE scores may 
tacitly influence admissions decisions, even if the scores are acknowledged to 
be of limited value. 

What Has Past Research Revealed About the Usefulness of the GRE? 

Dunlap (1979) studied admissions criteria as predictors of academic per- 
formance and professional potential of social- work students. Performance was 
successfully predicted from the faculty interview and undergraduate GPA. The 
GRE was a weak predictor, and letters of reference were of little value. Other 
studies have shown that GREs can be good predictors of grades and faculty 
evaluations, at least for first-year graduate performance in psychology (Dawes, 
1971). Clearly, the test has been demonstrated to have some predictive validity 
to some criteria of success in graduate school; however, an informal review of 
149 studies on the predictive validity of the GRE across fields showed that, on 
average, the GRE considered alone accounted for a little less than 10 percent of 
the variation in the various criteria of graduate performance (Wood & Wong, 
1992). 

According to the GRE Technical Manual (Briel et al., 1993), although a 
three-factor solution fits the verbal, quantitative, and analytical portions fairly 
well, the items tend to be rather highly correlated. Rock, Werts, and Grandy 
(1982) found that the verbal and quantitative factors were correlated .64, the 
verbal and analytical items .77, and the quantitative and analytical items .77. A 
slightly better fit of model to data was obtained when a reading-comprehension 
item was separated from the verbal item (see also Powers & Swinton, 1981). 
Similar correlations between pairs of items have been found by others (e.g., 
Schaeffer & Kingston, 1988). 

Empirical validities of the GRE vary somewhat by field. K. M. Wilson 
(1979) showed that, in the prediction of first-year grades, the median validity 
coefficients for first-year grades in psychology graduate school were .18 for the 



3Z 



29 




Verbal test, .19 for the Quantitative test, and .32 for the subject-matter test. The 
tendency for the subject-matter test to predict first-year grades better than the 
Verbal, Quantitative, and Analytical tests is common. In another cooperative 
study, the median correlation of the subject-matter test with first-year grades 
was .31 (Burton & Turner, 1983). In this same study, predictive validities of the 
GRE to first-year grades for the Verbal and Quantitative tests over all social sci- 
ences were .26 and .22, respectively. Schneider and Briel (1990) found overall 
correlations with first-year grades of .26 for the Verbal test, .25 for the 
Quantitative test, .24 for the Analytic test, and .36 for the subject-matter test. 
Undergraduate GPA showed a similar correlation to that for the subject-matter 
test (.34). Recent research focusing on graduate training in physics revealed that 
GREs provide only marginal prediction of graduate performance. In addition, 
there are sex differences in scores on the Physics Advanced Test (in favor of 
men) that are not reflected in graduate-school performance (Glanz, 1996). 

Some studies have focused on criteria other than first-year grades in graduate 
school. Rock (1974) found that for applicants for National Science Foundation 
fellowships, correlations of GRE scores with attainment of the PhD in psycholo- 
gy in two random samples were .12 and .19 for the Verbal test, .33 and .14 for 
the Quantitative test, and .19 and .24 for the subject-matter test. Schrader (1978, 
1980) obtained citation counts from the Social Sciences Citation Index and from 
the Annual Reviews of Psychology as well as publication rates from the 
Psychological Abstracts. He found correlations with these criteria of .15 to .30 
for the GRE Verbal, .24 to .32 for the GRE Quantitative, and .32 to .47 with the 
GRE subject-matter advanced test. But in another study, Clark and Centra 
(1982) failed to detect any significant correlations between publication rates and 
GRE scores for recent PhDs. 

Some studies have focused on relative predictions for various subpopulations 
of graduate-school students. Braun and Jones (1985) found no differential pre- 
diction across subgroups varying in age, sex, or race; however, Swinton (1987) 
found significant underprediction of first-year grade averages for women in all 
fields of graduate study. 

In related research focusing on the SAT, Crouse and Trusheim (1991) argued 
that the selection benefits colleges derive from using the SAT in admissions 
decisions are minimal (see, also, Crouse & Trusheim, 1988; Jencks & Crouse, 
1982). Similarly, in research focusing on the Medical College Adminissions Test 
(MCAT), Gough and Hall (1975) studied the prediction of academic versus clin- 
ical performance in medical school. They found that academic performance was 
predicted by the MCAT and premedical GPA. However, clinical performance 
was not predicted from MCAT scores and premedical academic-achievement 
indices. Gough and Hall noted that the clinical performance factor was more 
important than academic attainment in explaining who excelled in medical 
school. 




Why Don’t These Tests Better Predict Who Will Succeed in College 

and Graduate School? 

The fact that the tests used to select applicants do not robustly predict who 
will succeed in college or graduate school raises the issue of academic versus 
practical or real-world problems (e.g., Neisser, 1976; Neisser et al., 1996). 
Neisser (1976) was one of the first psychologists to press the distinction 
between academic and practical intelligence. He described academic-intelli- 
gence tasks (common in the classroom and on intelligence tests) as formulated 
by others, often of little or no intrinsic interest, having all needed information 
available from the beginning and disembedded from an individual’s ordinary 
experience. In addition, these tasks usually are well defined, have but one cor- 
rect answer, and often have just one method of obtaining the correct solution 
(Wagner & Sternberg, 1985). Note that these characteristics apply less well to 
many of the problems we face in our daily lives, especially at work. Work prob- 
lems often are unformulated or in need of reformulation, of personal interest, 
lacking in information necessary for solution, related to everyday experience, 
poorly defined, characterized by multiple “correct” solutions (each with liabili- 
ties as well as assets), and characterized by multiple methods for picking a prob- 
lem solution. 

The distinction between academic intelligence (“book smarts”) and practical 
intelligence (“street smarts”) has long been recognized by the nonscientist. 

Many common expressions attest to the essential role of practical intelligence in 
everyday life (e.g., “learning the ropes” and “getting your feet wet”). Both 
laypeople and researchers include concepts of academic and practical intelli- 
gence in their own implicit theories of intelligence (Sternberg et al., 1981). 
Recently, practical intelligence has been the focus of a growing number of stud- 
ies carried out in a wide range of settings and cultures. Summaries of aspects of 
this literature have been provided by Ceci (1996), Rogoff and Lave (1984), 
Scribner and Cole (1981), Sternberg and Wagner (1986, 1994), Sternberg, 
Wagner, and Okagaki (1993), and Voss, Perkins and Segal (1991). 

The distinction between academic and practical intelligence is illustrated by 
studies in which participants were assessed on both academic and practical 
tasks. The consistent result is little or no correlation between performance on 
the two kinds of tasks. IQ is unrelated to the order-filling performance of 
milk-processing plant workers (Scribner, 1986); the degree to which racetrack 
handicappers use a complex and effective algorithm (Ceci & Liken 1986, 1988); 
the complexity of strategies used in computer-simulated roles such as city man- 
ager (Domer & Kreuzig, 1983; Domer, Kreuzig, Reither, & Staudel, 1983); and 
the tacit knowledge of undergraduates (Wagner, 1987; Wagner & Sternberg, 
1985), business managers (Wagner & Sternberg, 1990), salesperons (Wagner, 
Rashotte, & Sternberg, 1992), and Air Force recruits (Eddy, 1988). In addition, 
the accuracy with which grocery shoppers identified quantities of food that pro- 

34 



31 




vided the best value per price was unrelated to their performance on the M.I.T. 
mental arithmetic test (Lave, Murtaugh, & de la Roche, 1984; Murtaugh, 1985). 

The distinction between academic and practical problems can help us under- 
stand why the GRE and tests like it fall short in predicting performance in 
real-world environments (as I will discuss below). It is not surprising that stan- 
dardized tests tend better to predict later performance on academic tasks than 
they do on practical or real-world tasks. In the case of the medical school 
research (Gough & Hall, 1975), the MCAT predicted later performance of an 
academic type, but not the essential clinical performance of medical-school stu- 
dents, which involves solving different types of problems from those found on 
standardized tests and course exams. For these and other related reasons, 
researchers have been critical of the entire psychometric approach (e.g., Ceci, 
1996; Gardner, 1983, 1993; Neisser et al., 1996). 

Partly in response to the psychometric tradition, a growing body of research 
has focused on delineating and assessing the types of ability needed to succeed 
on practical as opposed to academic problems (see, e.g., Wagner, 1987, Wagner 
& Sternberg, 1985, Williams & Sternberg, 1993; Williams et al., 1996; for a 
review, see Sternberg, Wagner, Williams, & Horvath, 1995). Much of this 
research on practical problems is related to the systems-theory view of intelli- 
gence, discussed earlier. The psychometric view of intelligence, and specifically 
the view that g is the single best measure of intelligence, is often seen as being 
at odds with current systems views. Members of the psychometric camp tend to 
point out that the GRE and tests like it provide the best estimates available of 
the type of ability needed for graduate school; for example, ifg is what matters 
to success, then people scoring high on the GRE should do better in graduate 
school than people who do not score high, given the high g loading of the GRE. 

Systems theorists, on the other hand, would argue that the abilities measured 
by the GRE are only a subset of the abilities needed in graduate school. 
According to this view, the GRE is seen as being of limited value in predicting 
who will succeed in the graduate-school environment. For these reasons, sys- 
tems theorists such as Ceci (1996) have called for empirical verification of the 
predictive validity of the GRE and other similar g-loaded tests and have 
expressed concern that even if these tests do predict worthwhile outcomes, it is 
essential to reveal the mechanisms through which they do so (i.e., to distinguish 
between description and explanation), as well as the contextual factors relevant 
to these predictions. 

After reviewing the literature on what selection tests such as the GRE do and 
don’t do, one can conclude that these tests usually provide a modest amount of 
information about first-year course performance in graduate or professional 
school. But this fact raises the question of whether we should be selecting stu- 
dents on the basis of who will get the best grades in first-year courses. 
Ultimately, what really matters is not first-year grades, but meaningful perform- 
ance as a psychologist, in graduate school and thereafter. How well does the 



32 



35 




GRE predict performance as a professional psychologist, when administered 
prior to starting a graduate program? These were the questions we sought to 
answer by studying the performance of graduate students in the Yale University 
psychology department. 

What Did We Predict, and Why? 

On the basis of Sternberg’s (1985) triarchic theory of intelligence (discussed 
above), which views intelligence in terms of three types of abilities (analytic, 
creative, and practical), we predicted that analytical abilities would be the most 
important for performance on the GRE, in view of its emphasis on factual 
recall, and would predominate in course performance as well, given the way 
courses tend to be taught. Thus, we expected GREs to provide some prediction 
of course grades. The theory also suggests that practical and especially creative 
abilities will be critical for performance as a psychological researcher or practi- 
tioner and that although analytical abilities will also be important, they will not 
hold any privileged position. Moreover, even these abilities will be within a 
domain (psychology) as practiced within a particular context (e.g., university, 
private practice), so that the validity of the GRE for a broad array of domains 
would be open to question. 

Introductory graduate psychology courses usually require the same kinds of 
fairly abstract and often context-lean memorization and analysis that are 
required by conventional tests of ability and achievement. Thus, the GRE, as a 
measure of analytical ability, would be likely to predict course-grades. However, 
success in psychology as a career, or even in the latter years of graduate training 
in psychology, may require creative and practical abilities in addition to analyti- 
cal abilities. Creative abilities are necessary to formulate theories, empirical 
research, or hypotheses. Practical abilities are required to succeed within the 
university promotion system; to get grants funded; or to attract, keep, and treat 
clients. Thus, we might expect the GRE to be weaker as a predictor of success 
as a psychologist, or even as an advanced graduate student, than we would 
expect it to be as a predictor of initial graduate grades. This weaker relation fol- 
lows from the fact that the GRE measures primarily analytical abilities. Such 
abilities, especially as measured in the context of a standardized test, will be 
only minimally related to creative and practical abilities (see Sternberg, 1985, 
1996; Sternberg, Ferrari, Clinkenbeard, & Grigorenko, 1996). 

We also predicted that (a) the best predictor of graduate grades would be the 
GRE advanced (subject-matter) test in psychology, because the best predictor of 
future achievement of a given kind is past achievement of the same kind; (b) the 
GRE would be a better predictor of first-year graduate grades than of 
second-year grades, because the testing is closer to the first year of performance 
and because first-year courses are less advanced (the more advanced the courses 
become, the more they are likely to draw on skills beyond the conventional 




memory and analytical ones); and (c) the GRE would be only weakly predictive 
or not predictive of more meaningful criteria of graduate-program success, such 
as ratings by professors of various aspects of the quality of students and their 
work, including ratings of students’ demonstrated analytical, creative, practical, 
research, and teaching abilities, as well as of their dissertations. If the GRE pre- 
dicted anything, we expected it to be best for professors’ ratings of analytical 
ability. However, we believed that the correlation would not necessarily be sub- 
stantial, because analytical ability as demonstrated in the actual graduate work 
context might be somewhat different from analytical ability as demonstrated in 
a paper-and-pencil test. 



Method 

To test our hypotheses, we asked all faculty members in the Department of 
Psychology at Yale University who supervised graduate students or who were 
on dissertation committees during the period from 1980 to 1991 (N = 40) to 
provide certain ratings. Graduate advisors were asked to rate their primary grad- 
uate-student advisees (i.e., those for whom they were Ph.D. dissertation supervi- 
sor) for five types of abilities: analytical, creative, practical, research, and teach- 
ing. Evaluations were on a low (1) to high (7) scale for each rating. The faculty 
members were told to use the scale in the following way: 7 = absolutely 
superlative — among the very best in our graduate program; 6 = outstanding — 
among the top 10 percent in our graduate program; 5 = excellent — among the 
top 25 percent in our graduate program; 4 = very good-among the top 50 per- 
cent in our graduate program; 3 = good — among the top 75 percent in our 
graduate program; 2 = fair — among the top 90 percent in our graduate pro- 
gram; and 1 = poor — among the very weakest in our graduate program. 

Obviously, there is some degree of subjectivity in ratings such as these, and 
the possibility of halo effects as well. 

To broaden our criterion information, we obtained the overall evaluations of 
dissertations (for those students who had completed dissertations) from the three 
ratings given by the three dissertation-committee readers (all of whom were 
psychology faculty but none of whom was the primary dissertation advisor). 
These ratings were on a 4-point scale. Because lower ratings corresponded to 
higher evaluations of dissertations, we reflected the ratings for this study to 
make the direction consistent with all the other measures (higher numbers corre- 
sponding to better performance). We also computed GPAs for students’ first 
year, second year, and combined first and second years of graduate training. 
These GPAs were based on a grading system of honors (4), high pass (3), pass 
(2), and fail (0), with failing grades being exceedingly rare. We included in our 
sample all matriculants, including those who had not completed the program. 

We used as predictor variables scores from the Verbal, Quantitative, 
Analytical, and advanced tests of the GRE. The verbal, quantitative, and analyti- 

37 



34 




cal sections are required at Yale; the advanced test is not. Our sample ultimately 
consisted of 1 70 graduate students. Of these students, 84 had completed their 
dissertations at the time of our study, meaning that for these 84 students we 
were able to obtain the dissertation readers’ evaluations of their dissertations. 
Data on 3 students were incomplete, leading to 167 students in the final sample 
(68 men, 99 women). Because the GRE advanced test is not required at Yale, 
and because some students take the test in a field other than psychology (result- 
ing in the exclusion of their advanced test scores from the sample), the number 
of advanced test scores was reduced (N = 73). 

Results 

There was a good range in GRE scores among the students in the psychology 
department: Verbal test scores ranged from 250 to 800 (M = 653, SD = 97), 
Quantitative test scores from 320 to 840 (M = 672, SD = 78), Analytical test 
scores from 410 to 810 (M = 656, SD = 92), and psychology advanced scores 
from 490 to 850 (M = 690, SD = 65). Advisor ratings ranged from 1 to 7; mean 
advisor ratings ranged from 4.46 on creative ability to 4.75 on analytical ability, 
and standard deviations ranged from 1.45 to 1.65. Year 1 GPA ranged from 2.46 
to 4.00 (M = 3.58, SD = .31), and Year 2 GPA ranged from 2.90 to 4.00 (M = 
3.72, SD = .33). Dissertation reader ratings (the mean of three readers per stu- 
dent) ranged from 1 to 3.33 (M = 3.10, SD = .56). Although the mean GRE 
scores in the mid to high 600s were relatively high and well above the national 
average, the Verbal, Quantitative, and Analytical test standard deviations and 
ranges were also quite high. The standard deviations came close to the national 
ones. Ratings of student performance also varied considerably. Thus, restrictions 
of range cannot be blamed for our results, a point I will return to later. 

When we looked at the results separately for men and women, we found no 
significant differences on any of the measures: Men and women were compara- 
ble in measured abilities and performance in our sample. We found that the sep- 
arate GRE scores were related to each other, with intercorrelations ranging from 
.17 to .58 (for men and women combined). We also looked at intercorrelations 
within the various criteria (averaging the two years of GPA). Overall GPA dur- 
ing the first two years correlated modestly with the other criteria: .41 with ana- 
lytical rating (p < .001), .16 with creativity rating (p < .05), .29 with practical 
rating (p < .001), .29 with research rating (p < .001), and .32 with teaching rat- 
ing (p < .001), as well as .32 with mean dissertation reader rating (p < .01). 

The most important analyses were the correlations of the predictors (GRE 
scores) with the criteria (advisors’ and readers’ ratings as well as grades). These 
data are shown in Table 1 , for the sexes combined and separately for men and 
women. First, GREs did have some modest value for predicting grades, at least 
in the first year of graduate study. The median correlation across the four scores 
for men and women conbined was .17. 



35 





36 





Second, we had assumed a better prediction of first-year grades than of sec- 
ond-year grades in graduate study. In fact, although three of the four correla- 
tions for men and women combined were statistically significant for Year 1, 
none of the correlations for Year 2 was statistically significant (with a median 
across the four correlations of just .02). The psychology advanced test showed a 
significantly higher correlation with Year 1 GPA than with Year 2 GPA (with 
significantly higher correlation with Year 1 GPA than with Year 2 GPA (with 
listwise deletion for missing cases), t(61) = 2.48,/? = .02 (see Cohen & Cohen, 
1983, p. 56, for the formula used to test for the significance of the difference 
between these two dependent correlations). However, the differences between 
the correlations for Year 1 and Year 2 GPA for the other GRE subtests did not 
reach statistical significance. 

Third, the psychology advanced test correlated strongly with Year 1 GPA 
(.37). Test-measured achievement was thus a strong predictor of a grade-based 
achievement (as we predicted). Consequently, the GREs did provide modest 
prediction of grades in the first year; however, grades represent one of the least 
important aspects of graduate performance. We did not find prediction to the 
second year of grades. 

Fourth, with one exception, the GREs were not useful as predictors of other 
aspects of graduate performance: ratings of analytic, creative, practical, 
research, and teaching abilities by primary advisors and ratings of dissertation 
quality by faculty readers. For the combined sexes, only 4 of 24 correlations 
reached statistical significance. The median was only .12. We had expected that 
correlations would be higher with the analytical than with the advisors’ other 
ratings, but given the level of the correlations, there just was not enough relation 
for many of the correlations to be significantly different from zero, much less 
from each other. Thus, as systems views of intelligence would predict, GREs 
were generally not valid or otherwise useful predictors of important aspects of 
success. 

Fifth, it turns out that the four statistically significant correlations for the 
combined sexes cannot quite be taken at face value, because the table shows 
that in every case the effect is due to correlations for men, but not for women. 
There was, in fact, one consistently successful (although only marginally signif- 
icant) predictor of ratings: the GRE Analytical test score for men only (z = 1.95, 
p < .06, two-tailed; see Cohen & Cohen, 1983, p. 53, for the formula used to 
test for the significance of the difference between the independent correlations 
for men and women). (The only other significant correlations were for the GRE 
Quantitative test score predicting advisor’s analytical rating for men, and for the 
GRE Quantitative test score predicting advisor’s creative rating for women, but 
in the negative direction.) 

We also investigated several other data-analytic approaches that had the 
potential to improve the GRE’s predictive power. For example, we conducted 
multiple regressions that optimally combined the various GRE scores, both with 



37 




and without the advanced test. The results were substantially unchanged: 
Combination of multiple scores did not significantly improve prediction. We 
also did canonical regressions, thereby using multivariate analysis linearly to 
combine dependent as well as independent variables in an optimal way. Again, 
we obtained no significant improvements in prediction. 

In sum, GREs were found to be modest predictors of first-year but not sec- 
ond-year grades in our graduate program, both for men and for women. 
However, only the GRE Anaytical test score was found to predict more conse- 
quential evaluations of student performance, and only for men. 

Criticisms of Our Study 

One potential criticism of our study concerns the issue of range restriction. 
However, restriction of range cannot be fully blamed for the pattern of correla- 
tions we obtained. First, as noted earlier, our standard deviations and ranges 
were rather substantial. Second, the fact that significant correlations were 
obtained between GREs and first-year grades, and between the GRE Analytical 
test score for men and the professors’ ratings, suggests that correlations could be 
obtained where they existed. Third, good prediction of grades was found for the 
GRE subtest with the lowest scaled-score standard deviation (the psychology 
advanced test). 1 

A second criticism concerns the unreliability of faculty ratings. Some might 
argue that any kind of subjective rating is notoriously unreliable, so that one 
could hardly expect any test, the GRE included, to show substantial correlations 
with such unreliable and possibly invalid criteria. In fact, grades correlated with 
the ratings, the ratings correlated with each other, and the GRE Analytical test 
score correlated with the ratings for men. Consequently, it was possible to 
obtain correlations with the ratings, suggesting unreliability was not responsible 
for the failure of the GRE to correlate with the ratings. 

A third criticism is that Yale University graduate students are not typical of 
graduate students in general. It is possible that our findings might not replicate 
for other programs and other graduate students. It is also possible that Yale’s 
graduate-program emphasis might also be unrepresentative. However, like most 
graduate programs, Yale’s clinical training emphasizes the Boulder model of the 
scientist-practitioner, and Yale’s nonclinical track trains students for industry 
and government positions as well as for traditional academic jobs. Furthermore, 
the range in GRE scores of admitted applicants provides additional testimony to 



'Although we tried to correct for restriction of range, we acknowledged that Yale’s students are 
not typical of all students who enter psychology graduate programs and that different results 
might be obtained across the entire range of students in all programs. We also acknowledged that 
our relatively small sample size restricted both the power of the significance tests to detect actual 
differences or relationships and the generalizability of our results to future psychology graduate 
students at Yale and at other schools, as well. 

38 






the diversity of the student body. (Note that lack of proficiency in English by 
foreign students was rare in Yale’s psychology department; but, nevertheless, 
the analysis performed on GRE Quantitative test scores alone addresses the pos- 
sibility that our results would change if we ignored verbal proficiency in 
English.) 



Conclusions of Our Study 

The results of our study underscored the need for serious validation studies 
of the GRE, not to mention other admissions indices, against measures of conse- 
quential performances, whether of students or of professionals. The point is that 
we should apply the same standards of falsifiability in our admissions process as 
we do in our scientific work. Sometimes, the use of a test can become self- 
perpetuating, without serious attempts to verify its effectiveness. Our study sug- 
gested the need to reflect on our use of tests before they become firmly and 
even irrevocably entrenched. Psychologists regularly create, refine, and export 
standardized tests for use throughout the academic community and our society 
in general. Thus, psychologists should remain aware of the issues revealed by 
this study and work to ensure that the tests we advocate are used effectively and 
appropriately. 

An obvious direction for the tests of the future is to expand upon the types of 
abilities or intelligences assessed (e.g., practical and creative intelligence). 
Theories such as Sternberg’s (1985) triarchic theory, Ceci’s (1996; Ceci & 
Roazzi, 1994) bio-ecological theory, or Gardner’s (1983, 1993) multiple intelli- 
gences theory might be used as bases for the development of expanded tests, 
both paper-and-pencil and performance-based. To guide this future work, psy- 
chologists need to remain reflective about what is meant by the concept of intel- 
ligence as it applies in the context of success in a graduate program in psycholo- 
gy (see Neisser, 1979; Sternberg, 1985). 

In conclusion, this research found the traditional psychometric approach to 
predicting graduate-school achievement, represented by GRE, to be somewhat 
lacking in prediction of meaningful aspects of success in graduate school. Like 
other studies that have examined related g-loaded intelligence tests (the GRE, 
SAT, PSAT, LSAT, and GMAT), we found that the GRE is best at predicting 
grades earned in the semesters immediately following admission. Thus, in gen- 
eral, when we select students for admission and financial aid awards based on 
high GRE scores, we are selecting students most likely to do well in course 
work but not necessarily more likely to do well in research and teaching than 
applicants with lower GRE scores. This, in a sentence, is the likely implication 
for graduate students of the psychometric view of intelligence and the use of 
tests constructed on the basis of this view. Further research is warranted so that 
we can develop better predictive tools that provide more and better information 
about meaningful aspects of performance. Perhaps some of these tests of the 




future will be based on systems -oriented views of intelligence. I now discuss 
one example of an educational program that used a systems approach as the 
basis for instruction and assessment. 

One Educational Program Based on a Broader Conception of 

Intelligence 

What happens when one applies a systems theory of intelligence to class- 
room instruction and assessment? To investigate this question, Howard Gardner, 
Tina Blythe, Noel White, and Jin Li at Harvard University collaborated with 
Robert Sternberg and myself at Yale University on the Practical Intelligence for 
School (PIFS) Project (Williams et al., 1996, 1997). This work began with the 
observation that possession of analytical or academic intelligence does not 
always lead to success in school. Children also need practical and creative 
thinking skills. In the world after graduation from school, practical and creative 
skills are likely to become even more essential to success (see Sternberg et al., 

1995) . 

Consider an experience shared by many parents: A child with good reading 
and writing skills and a solid vocabulary hands in a messy composition, filled 
with cross outs, after insisting to her parents that her teacher said that what real- 
ly counts are the child’s ideas. A week goes by, and the child receives a poor 
grade — and is shocked. This child is intelligent in the traditional, g-based 
sense and does well on tests of intelligence (see Neisser, 1979; Neisser et al., 

1996) . Yet, the child seems to lack some kind of intelligence relevant to the 
school environment, what my associates and I called “practical intelligence for 
school” (Gardner, Krechevsky, Sternberg, & Okagaki, 1994; Sternberg, 

Okagaki, & Jackson, 1990; Williams et al., 1996). Students with practical intel- 
ligence for school both understand and are able to respond appropriately to the 
demands of the school environment, which include doing homework, taking 
tests, reading for understanding, and writing effectively. 

Practical intelligence for school is a specific aspect of the more general con- 
struct of practical intelligence, which has been studied by a variety of investiga- 
tors in a range of contexts (see Ceci, 1996; Sternberg & Wagner, 1986; 

Sternberg et al., 1995, for reviews). This research has shown that people have a 
set of procedural-knowledge skills that are relevant to their adaptation to 
real-world environments. This set of procedural-knowledge skills are now well 
or fully conceptualized by conventional notions of intelligence and are not well 
measured by conventional intelligence tests. Some do not accept that practical 
intelligence exists (e.g., Ree & Earles, 1993; Schmidt & Hunter, 1993). 
However, my associates and I believed that there was sufficient evidence for 
such a construct that it was worth pursuing its applications to the classroom 
environment. Practical intelligence can be viewed as a part of intelligence, 
broadly defined, whether or not it is viewed as wholly distinct from the academ- 

43 



40 




ic aspects of intelligence. 

Can Intelligence Be Taught? 

Given the fuzzy nature of this question, it is not surprising that the scientific 
evidence has been mixed and subject to alternative interpretations. On the one 
hand, a number of controlled studies have yielded impressive gains. For exam- 
ple, the ODYSSEY project was designed to raise the intellectual skills and 
school performance of Venezuelan school children of roughly middle-school age 
and was evaluated with highly favorable results by Hermstein, Nickerson, 
DeSanchez, and Swets (1986). Ramey (1994; Ramey & Campbell, 1984, 1992; 
Ramey et al., 1992), studying younger children, also accumulated substantial 
evidence that gains in intelligence and school performance are possible as a 
result of intensive interventions among high-risk preschoolers. Other programs 
have also shown at least limited, and sometimes quite impressive, success 
(Bereiter & Engelmann, 1966; Feuerstein, 1980; Garber, 1988; Nisbett, Fong, 
Lehman, & Cheng, 1987; Schweinhart, Barnes, & Weikart, 1993; see, also, 
Detterman & Sternberg, 1982; Honig, 1994; Nickerson, 1994; Nickerson, 
Perkins, & Smith, 1985). 

Some believe, however, that it is not possible to increase intelligence. 
Hermstein and Murray’s (1994) review of the literature led them to conclude 
that little meaningful gain is possible; others have come to the same conclusion 
(Jensen, 1969, 1989; McLaughlin, 1977; Spitz, 1986, 1992). In the middle of 
the road are studies that are encouraging, but cautious, in their interpretations 
(Consortium for Longitudinal Studies, 1983; Lazar & Darlington, 1982; Snow 
& Yalow, 1982; Zigler & Berman, 1983). In sum, no serious psychologist has 
suggested that unlimited gains in intelligence are possible; however, some 
believe that modest to moderate gains are possible in some cases under limited 
circumstances. 

Theoretical Motivation 

The theoretical motivation for our research was a combination of the theory 
of multiple intelligences proposed by Gardner (1983, 1993) and the triarchic 
theory of intelligence proposed by Sternberg (1985, 1996). We integrated the 
two theoretical frameworks by viewing intelligence in the seven domains pro- 
posed by Gardner as having the three aspects proposed by Sternberg. Consider, 
for example, the case of linguistic intelligence, an important component of 
school success. Linguistic intelligence may be seen as encompassing analytical 
aspects (e.g., in understanding how to develop a logically consistent argument), 
creative aspects (e.g., in writing a creative essay or a poem), and practical 
aspects (e.g., in writing or speaking persuasively to one’s teacher or fellow stu- 
dents). The same merging applies to each of the multiple intelligences. 

44 



41 




In this set of studies, we sought to boost school achievement by creating an 
intervention that would develop practical intelligence for school for 
middle-school-age students (Williams et al., 1996). The PIFS program is a cur- 
ricular intervention designed to enhance the practical-thinking skills of fifth- 
and sixth-grade students. We focused on fifth- and sixth-grade students because 
we believed that the point at which a child leaves primary school and enters 
intermediate school is a time when the child is ripe for instruction in practical 
thinking skills. The child at this juncture is old enough to assimilate and use the 
skills, but young enough to be open to learning them. Also, the child’s practical 
thinking skills will become more essential as she or he enters the intermediate 
school environment, in which she or he must change classes several times a day 
and deal with the demands of different teachers. The PIFS Project involved 
intensive observations and interviews of students and teachers to determine the 
tacit knowledge necessary for success in school. 

How did we uncover the practical thinking skills essential to school success? 
We began by developing a taxonomy of practical thinking skills. This taxonomy 
consisted of five themes — knowing why, knowing self, knowing differences, 
knowing process, and reworking — which were applied to practical thinking in 
four domains — reading, writing, homework, and test taking. To illustrate, con- 
sider the taxonomy applied to skills in test taking: The first theme is knowing 
why. Students master this area of the test-taking curriculum by answering ques- 
tions such as What are the roles of tests in and out of school? How does testing 
relate to other class work? The second theme is knowing self, for which stu- 
dents learn how to recognize current study strategies and test-taking practices 
and identify personal strengths and weaknesses in terms of testing. 

The third theme is knowing differences, for which students learn how to rec- 
ognize different kinds of tests and test questions, within and across subjects, 
learn what each test can and cannot determine about the test taker, and leam dif- 
ferent strategies that are appropriate for each test. For the fourth theme, know- 
ing process, students come to understand that long-term preparation is necessary 
to preparing for tests, and they leam both long-term and short-term strategies for 
test preparation, as well as strategies for solving problems during actual test tak- 
ing. The fifth and final theme, reworking, involves students’ using the results of 
tests as an opportunity for self-reflection and as a stepping stone toward more 
productive learning and test taking. 

The PIFS curriculum book, distributed to teachers, contained an overview of 
the scientific basis of the research, teacher training materials, and 35 one- to 
three-hour-long lessons that could be adapted by teachers to their students’ 
needs. The curriculum was implemented twice over two consecutive years in 
schools in Connecticut (n = 193 students) and Massachusetts (n = 321 students) 
in a matched-control-group design. We developed pre- and posttests designed to 
assess the quality of students’ practical knowledge in each of four focal areas 




covered by the curriculum (reading, writing, homework, and testing). These 
tests were administered in October and June. All of the tests were based on the 
kinds of tasks students are typically asked to do in school to make them fair to 
students who had not been exposed to the curriculum. 

For example, we used two 50-minute reading assessments, one based on a 
factual passage and the other on a passage of fiction. Students read the passage 
and answered questions about their general understanding, their thinking 
processes while reading, the parts they found easy or hard to understand and 
why, how they would study for a test on the passage, and so on. Most of the 
questions were open-ended. The writing assessment involved two parts, each of 
which lasted 50 minutes. The first part of the pretest asked students to write a 
composition describing in detail a place they knew well. For the posttest, stu- 
dents described a person they knew well. 

Following the writing, students answered questions about their writing 
process — what was easy or hard, how they got their ideas and organized their 
presentation, and what their teacher’s reaction might be. For the second part of 
the writing assessment, students revised the composition they wrote on the first 
part. They then reflected on the revision process, indicating the parts they had 
added or deleted, explaining those changes, and predicting what the teacher 
might and might not like about the piece. Each assessment thus yielded tradi- 
tional academic measures of intelligence (e.g., the grammatical correctness of an 
essay) along with measures of practical intelligence (e.g., the persuasiveness of 
an essay). 



Results 

The result of PIFS curriculum evaluations were uniformly positive. Analyses 
of covariance (ANCOVAs) were conducted on all academic- and practical- 
intelligence variables, using the pretest score for each measure as the covariate, 
and comparing the Fall-to-Spring score changes for the PIFS and control-group 
students. In general, the PIFS program successfully enhanced practical and aca- 
demic skills. Positive results were observed in both years of the program in 
Studies 1 and 2 at Connecticut and Massachusetts sites. The PIFS effect 
occurred across a variety of initial conditions in which the PIFS group began 
either lower than, equal to, or higher than the control group. 

Consider representative results from the Year 2 Connecticut-site data (these 
results are quite similar to the Massachusetts-site results). The means for indi- 
vidual variables (academic and practical measures of reading, writing, home- 
work, and test-taking ability) ranged from 2.29 (SD = .70) to 4.09 (SD = .69) on 
the 5-point rating scale, ranging from poor (1) to excellent (5). For academic 
writing ability, for example, the PIFS (treatment) group showed a significant 
pretest-to-posttest increase, J(51) = 6.67,/? < .001. The control group also 



46 



43 




increased, but the change was not significant, £( 49) = 1.08. An ANCOVA 
showed that pretest score was a significant covariate and that there was a signif- 
icant PIFS effect, covariate F(l, 99) = 60.22,/? < .001. The PIFS group’s gains 
were significantly greater than the control group’s gains, F(l, 99) = 16.49,/? < 
.001. 2 For practical writing ability, for example, both the PIFS and control 
groups showed significant increases from pretest to posttest, respectively, £(51) 

= 9.89,/? < .001, and £(49) = 2.61,/? < .05. An ANCOVA showed that pretest 
score was a significant covariate and that there was a significant PIFS effect, 
covariate F(l, 99) = 88.19,/? < .001. The PIFS group’s gains were significantly 
greater than the control group’s gains, F(l, 99) = 25.33,/? < .001. 3 For an over- 
all practical-intelligence measure, the combined summary score showed both 
PIFS and control groups with significant pretest-posttest gains, respectively, 
£(53) = 14.32, p < .001, and £(51) = 3.24,/? < .01. An ANCOVA showed that the 
pretest score was a significant covariate and that there was a significant PIFS 
effect, covariate F(l, 103) = 214.61, p < .001. Once again, the PIFS group’s 
gains were significantly greater than the control group’s gains, F(l, 103) = 

60.89, p< .OOH 

Thus, in general, we found that the PIFS program successfully enhanced 
both practical and academic skills in each of the target skill areas (reading, writ- 
ing, homework, and test taking) in children from diverse socioeconomic back- 
grounds attending diverse types of schools. In addition, teachers, students, and 
administrators alike reported fewer behavioral problems in PIFS classes. 

Conclusions 

This research demonstrated that the role of tacit knowledge in school success 
is central, and significantly, this research showed that tacit knowledge can be 
effectively defined, efficiently taught, and used by students to improve their per- 



2 The mean covariate-adjusted improvement from the mean pretest score (of the whole sample) 
was .66 points for PIFS participants and .12 points for the control participants. This represented a 
26 percent increase at posttest for the PIFS participants compared with a 5 percent increase for 
control participants. After removing variance in posttest scores accounted for by pretest scores, 
the PIFS treatment variable accounted for 10 percent of the total variance in posttest scores and 14 
percent of the variance in regressed change. 

3 The mean covariate-adjusted improvement from the mean pretest score (of the whole sample) 
was .82 points for PIFS participants and .24 points for the control participants. This represented a 
32 percent increase at posttest for the PIFS participants compared with a 9 percent increase for 
control participants. After removing variance in posttest scores accounted for by pretest scores, 
the PIFS treatment variable accounted for 12 percent of the total variance in posttest scores and 20 
percent of the variance in regressed change. 

4 The mean covariate-adjusted improvement from the mean pretest score (of the whole sample) 
was .76 points for PIFS participants and .20 points for the control participants. This represented a 
29 percent increase at posttest for the PIFS subjects compared with an 8 percent increase for con- 
trol participants. After removing variance in posttest scores accounted for by pretest scores, the 
PIFS treatment variable accounted for 17 percent of the total variance in posttest scores and 37 
percent of the variance in regressed change. 



44 



47 





formance in school. This is welcome news for students who fall short of their 
potential because they lack basic practical insights into their teachers’ expecta- 
tions and how to fulfill these expectations. For teachers, the possibility of train- 
ing practical intelligence for school may mean less frustration with students who 
do not perform because of an array of factors not related to lack of analytical 
ability. The PIFS curriculum provides one method for and approach to training 
these essential practical skills. Students exposed to PIFS become better able to 
make optimal use of their gifts and abilities within the context of the school 
environment, while learning practical skills they can use throughout their lives. 

Over the past several decades there have been various trends favoring differ- 
ent types of curricular approaches and interventions. Sometimes, curricula are 
developed and implemented on the basis of anecdata, or anecdotal reports, 
instead of carefully controlled studies. For example, a recent trend in education- 
al intervention has focused on building emotional and moral intelligence (e.g., 
Coles, 1996). Lately, it is often said that we must educate our youth for charac- 
ter and moral values (e.g., the Character Counts curriculum in use across the 
United States). Certainly, we are not suggesting that there is anything wrong 
with wanting children to develop character and morality. However, the questions 
of exactly how to accomplish this goal (or any other educational goal), and of 
how to know if we are succeeding with any individual program, can only be 
answered in a scientifically adequate way through empirical research. 

Unlike the latest fad that sometimes becomes a curricular intervention in the 
absence of a solid theoretical foundation and rigorous supporting data, the PIFS 
program is rooted in theory and based on hard empirical evidence. Our data 
show something meaningful and promising by providing evidence that it is pos- 
sible to improve broad-based intellectual skills. By focusing on reading, writing, 
homework, and test-taking ability, we cast our net wide in an attempt to create 
meaningful changes in broad areas of students’ intellectual performances. 

As I have already discussed, there has been widespread disagreement regard- 
ing the degree to which children’s intellectual capabilities can be modified. For 
example, Hermstein and Murray (1994) essentially dismissed intervention 
effects, arguing that short of adoption, there are no meaningful ways to raise the 
intellectual performance of children. Although future research is needed to 
assess the long-term durability of training in practical intelligence, our data 
showed reasonable increases over the school year. Thus, on the topic of the con- 
troversy regarding potential intervention effects, we weigh in on the side of cau- 
tious optimism. In the very least, our data suggest that further research on 
increasing practical intelligence is warranted. 

A broader issue raised by this research concerns the definition of intelligence 
itself and how one’s definition affects one’s viewpoint regarding the modifiabili- 
ty of intelligence and how best to enhance it. On the theoretical side, a growing 
literature suggests that traditional g-based psychometric conceptions of intelli- 
gence are incomplete. Interest in the type of intelligence people use to solve 

48 



45 




real-world problems, referred to here as practical intelligence, has led to a broad 
cross section of studies identifying practical intelligence in different domains 
(Ceci, 1996; Rogoff & Lave, 1984; Scribner & Cole, 1981; Sternberg & 

Wagner, 1986, 1994; Sternberg et al., 1993; Voss et al., 1991) and even studies 
showing that practical intelligence can be assessed and taught (see Sternberg et 
al, 1995, for a review). 

Thus, theoretically speaking, a psychometrically based measure of intelli- 
gence such as g may not be the whole story when it comes to understanding 
children’s intelligence. Practically speaking, psychometrically based g may not 
be sufficient if we wish to be fair and accurate in our assessment of children’s 
capabilities in the classroom and beyond. Just as Renzulli (1986) has argued for 
a broad-based approach to identifying giftedness — in which children are iden- 
tified on the basis of not only above-average ability, but also high levels of 
motivation and creativity — we argued for a broad-based approach to identify- 
ing school-based competence. We believed that practical intelligence should be 
seen as an essential component of children’s competence, worthy of assessment 
and instruction in its own right. Our research suggests that training in practical 
intelligence can help children remediate areas of weakness, as well as build on 
existing skills, to improve their performance in many academic areas. 

The evidence regarding the modifiability of g-based intelligence is mixed. 
But if we can accept that intelligence is more than g, there is hope that meaning- 
ful increases in intelligence can be achieved, even if, for example, these increas- 
es do not focus on g-based abilities. Thus, putting aside one’s point of view 
regarding whether we can affect measures of g through training, our research 
shows that we can affect measures of practical intelligence through training. 
Whether one wishes to call practical intelligence a type of intelligence, a type of 
knowledge, a set of skills, or whatever, the point is that it can be delineated and 
taught successfully. 

The training of practical intelligence may be particularly useful in challenged 
populations, because students in these populations may have had little opportu- 
nity to acquire school-relevant practical intelligence on their own or at home. 
Training challenged students may help them to overcome a deficit in 
school-related knowledge and skills that could otherwise have derailed them. 
Many of these students may have latent capabilities that they have not harnessed 
or profited from because of a lack of fit between the student and the school 
environment. By helping students understand what is expected of them and why, 
and by demystifying the process of succeeding in school, training in practical 
intelligence may help to reach students who have previously opted out of the 
school experience. 

In conclusion, this research showed that practical intelligence can be identi- 
fied, assessed, and taught, in order to achieve meaningful increases in real-world 
success in the classroom. What are the implications of the broader definition of 
intelligence that provided the theoretical motivation of this research? First, our 



46 




